Abstract
Medical image segmentation requires both high accuracy and computational efficiency, especially in resource-constrained environments. This paper introduces RepSegNet, a novel deep-learning model optimized for medical image segmentation. RepSegNet integrates convolutional neural networks with reparameterization techniques, effectively capturing both local and long-range features while simplifying complex structures during inference. Extensive experiments on diverse medical imaging datasets demonstrate RepSegNet’s superior performance over state-of-the-art models in key segmentation metrics. The model’s lightweight architecture ensures scalability and real-time applicability on edge devices, significantly reducing parameters and computational cost during inference. RepSegNet represents a significant advancement in medical image segmentation, offering a robust, efficient, and scalable solution across diverse clinical applications. Its ability to maintain high accuracy while reducing computational demands paves the way for improved diagnostic processes and potential integration into real-time medical imaging systems. Comprehensive ablation studies validate both architectural components, with reparameterization providing 80.5% parameter reduction and MultiPathMobileBlocks contributing 8.7 F1 points average improvement across all medical imaging modalities.
Introduction
Medical image segmentation is crucial in modern healthcare for diagnosis, treatment planning, and monitoring. It involves precise delineation of anatomical structures and pathological regions, providing invaluable insights for healthcare professionals1,2. Despite its importance, medical image segmentation faces challenges due to image complexity, anatomical variability, and pathological changes. Traditional methods and manual segmentation have limitations in accuracy, consistency, and scalability3. This has led to the development of automated approaches, particularly using deep learning techniques like Convolutional Neural Networks (CNNs)2. Since the introduction of U-Net4, various architectures have emerged, including UNet++5, Attention U-Net6, and TransUNet7, each addressing specific challenges in medical image segmentation. While these advancements have significantly improved segmentation accuracy, they often come at the cost of increased computational complexity. Many state-of-the-art models require substantial computational resources, limiting their applicability in real-time scenarios and on resource-constrained devices8,9,10. In response to these challenges and evolving needs, we introduce RepSegNet, a novel deep learning model designed specifically for medical image segmentation. The model’s architecture is built upon the integration of CNNs with advanced reparameterization techniques, striking a balance between the feature extraction capabilities of CNNs and the efficiency gains offered by reparameterization.
At the core of RepSegNet’s architecture are two key components: a MultiPathMobileBlocks and a RepMobileUnit. The MultiPathMobileBlocks are designed to capture multi-scale features efficiently, allowing the model to learn both fine-grained details and broader contextual information simultaneously. The RepMobileUnit leverages reparameterization techniques to simplify the model’s structure during inference. During training, these blocks maintain a complex, multi-branch architecture that enhances the model’s learning capacity. At inference time, this structure is collapsed into a simple, single-branch architecture. RepSegNet’s design addresses several key challenges in medical image segmentation:
-
Accuracy: By effectively capturing both local details and long-range dependencies, RepSegNet achieves high segmentation accuracy across diverse medical imaging tasks. This is crucial for reliable clinical decision-making, where precise delineation of anatomical structures and pathological regions is paramount.
-
Efficiency: The reparameterization technique employed in RepSegNet allows for a significant reduction in computational demands during inference. This efficiency makes real-time segmentation feasible, even on edge devices, facilitating rapid analysis and decision-making in clinical settings.
-
Scalability: The lightweight nature of RepSegNet’s inference model ensures that it can be deployed across a wide range of computational environments. This scalability is essential for the model’s adoption in various clinical settings, from well-equipped research hospitals to resource-constrained rural clinics.
-
Comprehensive ablation studies demonstrating the necessity of both reparameterization (deployment efficiency) and MultiPathMobileBlock (multi-scale accuracy) across diverse medical imaging tasks.
Related work
Evolution of medical image segmentation techniques
Medical image segmentation has evolved significantly from manual delineation to advanced automated techniques. Early computerized methods included thresholding11, region-growing12, and edge detection13. More sophisticated approaches followed, such as active contour models14, level set methods15, and statistical techniques like Markov Random Fields16. Despite these advancements, traditional methods struggled with the inherent complexities of medical images, including low contrast, noise artifacts, and anatomical variability17. As medical imaging data volume and diversity increased, the limitations of conventional techniques became more apparent18. These challenges paved the way for machine learning and deep learning applications in medical image segmentation. These advanced methods promised to overcome many previous limitations, marking a new era in medical image analysis2.
Deep learning in medical image segmentation
The U-Net architecture, which establishes the encoder-decoder with skip connections as a standard for medical image segmentation4, has been widely adopted. This design effectively fuses deep, semantic features from the decoder path with shallow, high-resolution features from the encoder path. Subsequent research has focused on refining this paradigm to address specific limitations. One direction involves enhancing the skip connections to mitigate the semantic gap between encoder and decoder feature maps. Architectures such as UNet++ introduce dense connectivity along these pathways to improve feature propagation and gradient flow5. Another line of work incorporates attention mechanisms to improve model focus. Attention U-Net, for example, integrates attention gates that learn to suppress irrelevant image regions while highlighting salient features for the segmentation task20. More recently, to overcome the limited receptive field of CNNs, hybrid models like TransUNet have been proposed, which leverage Transformers to model long-range dependencies and capture global context7. While these architectural advancements improve segmentation accuracy, they often increase model complexity, creating a trade-off with the computational efficiency required for clinical deployment1.
Efficiency-focused architectures
As the demand for real-time medical image segmentation grows, particularly in resource- constrained environments, the focus has shifted towards developing efficient architectures that balance accuracy and computational cost. This trend has given rise to a new class of models designed specifically for mobile and edge devices, where processing power and memory are limited. MobileNets, introduced by Howard et al.21, pioneered the use of depthwise separable convolutions in deep learning architectures. This approach significantly reduced the number of parameters and computational complexity while maintaining competitive accuracy. The success of MobileNets inspired a series of improvements, including MobileNetV222, which introduced inverted residuals and linear bottlenecks to further enhance efficiency. EfficientNet, proposed by Tan and Le23, took a different approach by systematically scaling network width, depth, and resolution to achieve state-of-the-art accuracy with much smaller models. This compound scaling method has proven effective in various medical imaging tasks, including segmentation24. In the realm of medical image segmentation, lightweight U-Net variants have emerged. For instance, MobileNetV2-UNet25 combines the efficiency of MobileNetV2 with the U-Net architecture, demonstrating impressive results in various segmentation tasks while significantly reducing computational requirements. Similarly, EfficientUNet26 leverages the EfficientNet backbone to create a more efficient segmentation model. Recent work by Ma et al.27 introduced MobileSUNet, which employs a super-resolution module to enhance low-resolution feature maps, enabling efficient segmentation of high-resolution medical images. This approach addresses the challenge of maintaining high accuracy while processing large medical images with limited computational resources. Despite these advancements, the development of efficiency-focused architectures for medical image segmentation remains an active area of research. The ongoing challenge lies in further reducing model size and computational complexity without compromising segmentation accuracy, especially for complex anatomical structures and diverse imaging modalities8.
Reparameterization techniques
The growing demand for real-time medical image segmentation in resource-constrained environments has shifted focus towards developing efficient architectures that balance accuracy and computational cost. The core principle operates through algebraic equivalence. During training, a multi-branch topology with diverse convolutional paths provides a richer hypothesis space and improved gradient flow. These branches, combined with batch normalization layers, can be mathematically transformed into a single equivalent convolutional layer. The transformation absorbs batch normalization parameters (mean, variance, scale, shift) into the convolutional weights through linear operations. Since convolution is a linear operation before activation functions, multiple parallel convolutional branches sum to produce an equivalent single convolution with merged weights. This equivalence means the inference-time single-branch network produces identical outputs to the training-time multi-branch network for any input, while eliminating branching overhead and memory access costs.RepVGG demonstrates this framework by training networks with parallel 3\(\times\)3 convolution, 1\(\times\)1 convolution, and identity branches39. After training, these branches are algebraically merged into a single 3\(\times\)3 convolution. The 1\(\times\)1 convolution is zero-padded to 3\(\times\)3 dimensions, the identity branch is converted to a 3\(\times\)3 identity kernel, and all branches are summed element-wise to produce the final kernel. This conversion preserves mathematical equivalence while eliminating runtime branching overhead. Diverse Branch Block (DBB) extends reparameterization to heterogeneous structures including sequences of convolutions, multi-scale kernels, and average pooling41. Each branch type is transformed into an equivalent 3\(\times\)3 kernel through specific conversion rules: sequences are merged through kernel composition, average pooling is represented as a uniform-weight convolution, and different kernel sizes are handled through appropriate padding. The transformed kernels are summed to produce a unified representation. RepMobileNet adapts these principles to depthwise separable convolutions40. Since depthwise and pointwise convolutions operate on different channel configurations, the method applies reparameterization independently to each component. The depthwise branches merge into a single depthwise kernel, and the pointwise branches merge into a single pointwise kernel, maintaining the efficiency characteristics of mobile architectures. This training-inference decoupling provides a mechanism to expand model capacity during optimization without compromising deployment efficiency, which is essential for real-time medical image segmentation on resource-constrained devices.
Challenges in real-time medical image segmentation
Despite significant advancements, achieving real-time performance while maintaining high accuracy in medical image segmentation remains challenging. Key issues include trade-off between model complexity and inference speed1,2. Deployment in resource-constrained environments, limiting applicability in portable devices and remote facilities8,9. Maintaining accuracy across diverse imaging modalities and anatomical structures28,29. Addressing these challenges requires innovative approaches that balance accuracy, efficiency, and clinical applicability. Solutions must optimize model architectures, leverage efficient inference techniques, and adapt to diverse imaging scenarios while maintaining high performance to advance real-time medical image segmentation.
Methodology
Model architecture
Figure 1 shows RepSegNet architecture. RepSegNet is an advanced segmentation model designed to efficiently capture multi-scale features through its unique combination of MultiPathMobileBlocks and skip connections. The encoder consists of sequential MultiPathMobileBlocks, which apply depthwise separable convolutions to reduce computational complexity while maintaining high feature extraction capability. Each encoding stage is downsampled using max pooling. The bridge connects the encoder and decoder, facilitating the transition between high-level and detailed features. The decoder employs upsampling and concatenates features from corresponding encoder stages via skip connections, enhancing spatial resolution and context. This architecture ensures efficient, precise segmentation by leveraging multi-scale feature integration.
RepSegNet architecture. Overall architecture of the proposed RepSegNet showing the encoder, bridge, and decoder components with skip connections. The encoder consists of sequential MultiPathMobile blocks with downsampling, while the decoder employs upsampling operations and concatenates features from corresponding encoder stages via skip connections. This structure enables efficient multi-scale feature extraction and integration for precise segmentation.
RepMobileunit
The RepMobileunit, illustrated in Fig. 2, is a core component of RepSegNet, optimizes computational efficiency and segmentation performance through depthwise and pointwise convolutions. During training, it employs a multi-branch structure with multiple convolutions, batch normalization, and ReLU activations, enabling rich feature learning. At inference, it transitions to a simpler structure through reparameterization, consolidating multiple layers into single equivalent operations for each convolution type. This dual-mode capability allows the RepMobileunit to leverage complex architectures for robust feature learning during training while benefiting from a streamlined, computationally efficient structure during inference. This makes it particularly suitable for real-time image segmentation tasks requiring both high accuracy and low latency. RepMobileunit draws inspiration from the structural reparameterization methodology introduced in RepVGG39 and subsequently adapted for mobile-efficient architectures in RepMobileNet40. Following the Diverse Branch Block (DBB)41 framework, we implement a multi-branch training structure that can be equivalently transformed into a single convolutional layer for inference.
RepMobileunit structure. (a) The RepMobileunit during training phase with multi-branch structure utilizing multiple convolutions, batch normalization, and ReLU activations. (b) The simplified RepMobileunit after reparameterization for the inference phase, where multiple layers are consolidated into single equivalent operations. This dual-mode capability allows complex architectures for robust feature learning during training while providing a streamlined structure during inference.
During training, the RepMobileunit applies multi-branch reparameterization separately to the depthwise and pointwise components of the depthwise separable convolution.
For the depthwise stage:
For the pointwise stage:
where I is the input, \(O_{\text {dw}}\) is the output of the depthwise stage, O is the final output, n is the number of depthwise 3\(\times\)3 branches, and m is the number of pointwise 1\(\times\)1 branches. The identity branches \(BN_{\text {id}}\) are present only when input and output channels match and stride equals 1.
During inference, each stage is independently reparameterized. The batch normalization transformation for each branch is:
where \(\gamma\) and \(\beta\) are learned scale and shift parameters, \(\mu\) and \(\sigma ^2\) are running mean and variance, and \(\epsilon\) is a small constant for numerical stability.
For the depthwise stage, each convolutional branch is fused by absorbing its BN parameters into the kernel and bias:
The same fusion applies to the 1\(\times\)1 depthwise branch and the identity branch. The 1\(\times\)1 kernel is zero-padded to 3\(\times\)3 dimensions, and all fused branches are summed:
Similarly, for the pointwise stage, all 1\(\times\)1 branches (including identity) are fused and summed:
The inference-time operation becomes two sequential convolutions:
This reparameterization retains computational efficiency of two simple convolutions while benefiting from the enhanced representational capacity of the multi-branch training structure.
MultiPathMobileBlock
Figure 3 shows MultiPathMobileBlock. The MultiPathMobileBlock is an innovative neural network module designed to enhance feature extraction efficiency by leveraging RepMobileunits in a structured manner. Unlike traditional Inception modules, which utilize parallel convolutions of varying sizes (e.g., 3x3, 5x5, 7x7), this block exclusively employs a series of 3x3 convolutions. The key advancement lies in the reuse of previous convolution outputs, optimizing computational resources while maintaining rich feature extraction. The block begins with a parallel 1x1 convolution and three sequential RepMobileunits. These units process the input data through depthwise convolutions followed by pointwise convolutions, significantly reducing the number of parameters and computations compared to standard convolutions. The output of each RepMobileunits is fed into the next, allowing the block to refine features progressively. Additionally, the input is directly fed into the first RepMobileunit, ensuring a diverse set of features is captured at each stage. The outputs of the 1x1 convolution and the three RepMobileunits are concatenated along the channel dimension. This concatenation integrates multi-scale features, mimicking the traditional Inception module’s ability to capture fine and coarse details. The concatenated output is then batch normalized and passed through a ReLU activation function, ensuring the stability and non-linearity of the feature maps. By reusing the 3x3 convolution outputs and integrating them effectively, the MultiPathMobileBlock achieves high computational efficiency and robust feature extraction. This approach not only minimizes redundancy but also enhances the model’s capability to learn complex patterns from the input data.
MultiPathMobileBlock design. The MultiPathMobileBlock featuring a 1x1 convolution in parallel with three sequential RepMobileunits. The outputs of each computational path are concatenated along the channel dimension and processed through batch normalization and ReLU activation. This approach optimizes computational resources while maintaining rich feature extraction capability.
Inception by residual connections
Comparison of different Inception architectures. (a) Parallel Inception with separate convolution kernels operating independently. (b) Sequence Inception using stacked convolutions to approximate larger receptive fields. (c) Inception by residual connection, our approach that achieves similar receptive field coverage with significantly reduced parameter count and computational requirements.
The utilization of multiple convolution kernel sizes in CNNs has emerged as a pivotal technique for enhancing feature extraction and improving model performance. This multi-scale strategy enabled the capture of both fine-grained local features and broader contextual information, leading to state-of-the-art results on the ImageNet dataset30. A subsequent work in Inception-v331, demonstrating that factorizing larger convolutions into sequences of smaller ones could maintain receptive field size while reducing computational overhead.
For a depthwise separable convolution with a depthwise followed by a pointwise convolution, where M is the number of input channels, N is the number of output channels, and \(D_K\) is the kernel size \(D_K = 3\) for \(3 \times 3\) convolutions, the number of parameters for the depthwise convolution is equal to \(D_K \times D_K \times M\). The total number of parameters for the depthwise-separable convolution is:
where \(D_K^2 = 9\) for \(3 \times 3\) kernels.
Figure 4(a) represents the parallel inception method. The total number of parameters is
The sequence of 3x3 convolutions to approximate 5x5 and 7x7 convolutions as in Fig. 4(b), results in smaller parameter count.
Figure 4(c) shows the use of residual connection to approximate 5x5 and 7x7 convolutions.
.
To calculate the reduction in parameters, authors used the ratio of parameters between the depthwise separable convolution and inception methods. The ratio is called a reduction factor.
For large \(N\), the first term \(\frac{9}{27N}\) becomes negligible, so the reduction factor approximates to:
This indicates that Fig. 4(c), residual inception method, has higher number of parameters than the depthwise separable convolution, approximately by a factor of 27 but twice less than Fig. 4(b), sequence inception and about 3 times less that Fig. 4(a), parallel inception, cases.
Ablation study design
To systematically evaluate component contributions, we designed controlled ablation studies replacing sophisticated components with functionally equivalent alternatives while maintaining identical training protocols and decoder architectures.
Model evaluation
We compared RepSegNet to eight baseline models: U-Net4, UNet++5, Attention U-Net6, ResUNet++32, Double U-Net33, TransUNet7, LeViT-UNet34, and DCSAU-Net35. These comparisons are crucial to demonstrate RepSegNet’s improvements in segmentation quality and efficiency. U-Net, UNet++, and Attention U-Net provide benchmarks for skip connections and feature integration. ResUNet++ offers a comparison for gradient flow in deep networks. Double U-Net allows evaluation against a multi-stage approach. TransUNet and LeViT-UNet enable comparison with transformer-based architectures. Finally, DCSAU-Net provides a benchmark for dense connections and spatial attention mechanisms. These comparisons highlight RepSegNet’s unique approach to feature extraction and integration, showcasing its performance against diverse architectural strategies in medical image segmentation.
Evaluation metrics
To assess our model’s performance and compare it with state-of-the-art approaches, we used precision, recall, F1 score, and mean Intersection over Union (mIoU). These metrics provide a comprehensive view of segmentation quality:
Precision measures the accuracy of positive predictions, crucial for assessing segmentation mask accuracy.
Recall quantifies the model’s ability to identify all relevant pixels or regions.
The F1 score balances precision and recall, useful for imbalanced datasets.
mIoU, or Jaccard index, evaluates the overlap between predicted and ground truth segmentation masks. Here, TP, FP, and FN represent true positives, false positives, and false negatives, respectively.
Code availability
The complete source code for RepSegNet is publicly available on GitHub at https://github.com/rashidjuraev/RepSegNet. To ensure permanent accessibility and version control, the exact version of the code used in this study has been archived at Zenodo and assigned DOI: 10.5281/zenodo.17971714. The code is released under the MIT License with no restrictions to access.
Experiments
Implementation details
The model was trained for a maximum of 100 epochs with a learning rate of 5e-4 and weight decay of 0.01. We used AdamW optimizer with layer-wise learning rate adjustments. The loss function combined SoftCrossEntropyLoss and DiceLoss (both with 0.05 smoothing factor) using JointLoss. A CosineAnnealingLR scheduler adjusted the learning rate, with a minimum of 1e-6.
Data augmentation
For the training set, images were resized to 256x256 pixels, with random horizontal flipping, ShiftScaleRotate, and brightness/contrast adjustments (all with 0.25 probability). CoarseDropout simulated occlusions. Images were normalized using ImageNet mean and standard deviation. Validation images were only resized and normalized to ensure accurate performance evaluation.
Datasets
We utilized three prominent medical image segmentation datasets: 1) ISIC-201836: 2594 dermoscopy images for lesion segmentation; 2) CVC-ClinicDB37: 612 polyp images (576x768) from colonoscopy videos; and 3) SegPC-202138: 775 microscopic images of plasma cells for 3-class pixel-level segmentation.
Representative examples from the three datasets used in this study. (a) CVC-ClinicDB: colonoscopy images showing polyps with corresponding ground truth masks. (b) ISIC 2018: dermoscopy images of skin lesions with segmentation masks. (c) SegPC-2021: microscopic images of plasma cells with pixel-level segmentation masks. These diverse imaging modalities demonstrate the versatility of our proposed approach across different medical domains.
Results
The performance and efficiency of our proposed model for medical image segmentation are thoroughly demonstrated through comprehensive comparisons with existing state-of-the-art models. The following tables provide empirical evidence of our model’s superiority in terms of accuracy (ACC.), precision (Prec.), recall (Rec.), F1 score (F1), mean Intersection over Union (mIoU), and model size.
Table 1 compares nine segmentation models on the ISIC-2018 dataset. DCSAU-Net achieves the highest accuracy (.9586), while RepSegNet excels in Precision (.9229), Recall (.9262), F1 score (.9245), and mIoU (.8616). Attention U-Net and LeViT-UNet also perform strongly, with high precision and competitive mIoU, respectively. TransUNet shows lower performance compared to other models. Overall, RepSegNet demonstrates exceptional precision and recall, making it particularly effective for ISIC-2018 image segmentation.
Table 2 compares nine segmentation models on the CVC-ClinicDB dataset. DCSAU-Net achieves the highest accuracy (0.9880), but RepSegNet excels in Precision (0.9657), Recall (0.9427), F1 score (0.9540), and mIoU (0.9145). Attention U-Net and Double U-Net show competitive performance, with Double U-Net achieving high Recall (0.9219) and F1 score (0.8960). LeViT-UNet and TransUNet perform less favorably, particularly in Precision and mIoU. Overall, RepSegNet emerges as the most well-rounded model, demonstrating superior performance across multiple critical metrics for the CVC-ClinicDB dataset.
Table 3 compares nine segmentation models on the SegPC-2021 dataset. RepSegNet outperforms all other models across all metrics, achieving the highest Accuracy (0.9553), Precision (0.9246), Recall (0.9223), F1 score (0.9235), and mIoU (0.8598). DCSAU-Net follows with strong performance in Accuracy (0.9495) and mIoU (0.8048). Unet++ shows notable Precision (0.9192), while Double U-Net achieves high Recall (0.8963). TransUNet and LeViT-UNet demonstrate lower performance, particularly in Precision and mIoU. RepSegNet’s consistent top performance across all metrics establishes it as the most effective and reliable model for the SegPC-2021 dataset.
Table 4 analyzes the complexity of various segmentation models in terms of parameters, computational cost (GMAC), and model size. TransUNet is the most complex with 67.02M parameters and 255.72 MB size, while DCSAU-Net is the least complex (2.6M parameters, 9.99 MB). RepSegNet demonstrates exceptional efficiency, reducing from 2.66M parameters during training to just 0.52M during inference, with corresponding decreases in GMAC (3.15 to 1.51) and size (10.44 MB to 4.68 MB). Traditional models like U-Net, Unet++, and Attention U-Net have larger parameter counts and sizes, with Attention U-Net being particularly computationally intensive (51.06 GMAC). RepSegNet’s significant complexity reduction during inference highlights its suitability for real-time processing on resource-constrained devices, making it an excellent choice for various clinical settings. Our ablation studies (Sect. “Ablation studies”) validate these architectural choices, demonstrating that removing reparameterization eliminates the 80.5% parameter reduction while degrading performance by 3.0 F1 points average, and removing MultiPath blocks causes 8.7 F1 points degradation while reducing feature extraction capability from four to two receptive field scales.
RepSegNet performance on CVC-ClinicDB dataset. Visual results demonstrating model performance on colonoscopy images: (a, d) Original colonoscopy images showing polyps; (b, e) Ground truth segmentation masks; (c, f) Prediction masks generated by RepSegNet. The model achieves high accuracy in polyp boundary delineation even in areas with complex tissue patterns and varied illumination conditions.
RepSegNet performance on ISIC 2018 dataset. Dermatological image segmentation results: (a, d) Original dermoscopy images of skin lesions; (b, e) Ground truth segmentation masks; (c, f) Prediction masks generated by RepSegNet. The model effectively captures both regular and irregular lesion boundaries, accurately segmenting lesions with varying colors, textures, and shapes.
RepSegNet performance on SegPC-2021 dataset. Cellular-level segmentation results: (a, d) Original microscopic images of plasma cells; (b, e) Ground truth segmentation masks; (c, f) Prediction masks generated by RepSegNet. The model achieves precise cell boundary delineation at the microscopic level, successfully distinguishing individual cells in dense clusters and capturing intricate cellular morphologies.
Figures 6, 7, and 8 provides a qualitative assessment of RepSegNet’s segmentation performance across the CVC-ClinicDB, ISIC 2018, and SegPC-2021 datasets. For the CVC-ClinicDB colonoscopy images, RepSegNet demonstrates high accuracy in delineating polyp boundaries, even in cases with complex tissue textures and varying illumination. In the ISIC 2018 skin lesion images, the model effectively captures both large, well-defined lesions and smaller, more subtle abnormalities, closely matching the ground truth segmentations. The SegPC-2021 results showcase RepSegNet’s capability in handling the intricate task of cellular-level segmentation in microscopic images. The model accurately identifies individual plasma cells and their components, maintaining clear distinctions between closely packed cells. Across all three datasets, RepSegNet’s predictions closely align with the ground truth masks, indicating robust performance in diverse medical imaging contexts. This visual evidence complements the quantitative results, highlighting RepSegNet’s versatility and effectiveness in tackling various medical image segmentation challenges.
Ablation studies
To validate our architectural design choices, we conducted systematic ablation studies evaluating the individual contributions of reparameterization and MultiPath components across all three datasets. We designed five controlled studies: (A) Full RepSegNet, (B) without reparameterization, (C) without MultiPathMobileBlock, (D) baseline with neither component, and (E) standard convolutions for comparison.
Component removal was implemented by replacing sophisticated blocks with functionally equivalent alternatives: RepMobileunits were substituted with standard depthwise separable convolutions, while complex MultiPathMobileBlock blocks were replaced with simplified 2-path inception blocks. This approach maintains architectural consistency while isolating specific innovations.
Results demonstrate that MultiPathMobileBlock blocks provide larger performance benefits (8.7 F1 points average) than reparameterization (3.0 F1 points), while reparameterization enables critical deployment efficiency through 80.5% parameter reduction. The progressive impact pattern—SegPC-2021 > CVC-ClinicDB > ISIC-2018—correlates with multi-scale feature requirements, validating our architecture’s scalability across clinical applications requiring varying detail discrimination levels.
Removing reparameterization eliminates deployment feasibility by fixing parameters at 1.89M and increasing inference time by 1.8\(\times\), rendering real-time clinical applications infeasible. MultiPathMobileBlock removal reduces receptive field scales from four to two, severely impacting diagnostic accuracy, particularly for microscopic imaging where cellular-level precision is critical.
Results demonstrate that MultiPathMobileBlocks provide larger performance benefits (8.7 F1 points average) than reparameterization (3.0 F1 points), while reparameterization enables critical deployment efficiency through 80.5% parameter reduction. The progressive impact pattern correlates with multi-scale feature requirements, validating our architecture’s scalability across clinical applications requiring varying detail discrimination levels. Removing reparameterization eliminates deployment feasibility by fixing parameters at 1.89M and increasing inference time by 1.8\(\times\), rendering real-time clinical applications infeasible. MultiPathMobileBlock block removal reduces receptive field scales from four to two, severely impacting diagnostic accuracy, particularly for microscopic imaging where cellular-level precision is critical.
Discussion and future directions
RepSegNet demonstrates notable improvements in medical image segmentation across the ISIC-2018, CVC-ClinicDB, and SegPC-2021 datasets. The model’s superior performance in accuracy, precision, recall, F1 score, and mean Intersection over Union (mIoU) compared to state-of-the-art architectures can be attributed to several key factors.
Multi-scale feature extraction: The novel combination of MultiPathMobileBlocks and RepMobileUnits enables efficient capture of both local details and global context through a sequential pathway design that specifically addresses the scale variance challenge in medical imagery, where structures range from cellular-level details to organ-level contours.
Computational efficiency: RepSegNet’s ability to reduce parameters from 2.66 million during training to 0.518 million during inference, while decreasing computational cost from 3.15 to 1.51 Gmac, represents a significant advancement in model efficiency. This characteristic is particularly valuable for real-time applications in resource-constrained environments.
Adaptive architecture: The use of residual connections to approximate larger convolutions allows RepSegNet to maintain a compact structure without compromising on the ability to capture complex features, addressing the ongoing challenge of balancing model complexity with segmentation accuracy. Cross-modality performance: RepSegNet’s consistent performance across different imaging modalities (dermatology, colonoscopy, and microscopy) suggests a robust architecture capable of generalizing to various medical imaging tasks. This versatility could potentially reduce the need for modality-specific models in clinical workflows. Systematic ablation studies confirm that RepSegNet’s superior performance stems from the synergistic interaction of reparameterization and MultiPath components. Reparameterization enables essential deployment efficiency (80.5% parameter reduction) while MultiPathMobileBlocks provide critical multi-scale feature extraction (8.7 F1 points improvement). This integrated approach successfully addresses medical image segmentation’s dual requirements: sophisticated training-time learning capabilities and deployment-ready efficiency for clinical applications.
Mechanistic analysis of performance improvements
Our comprehensive analysis reveals that RepSegNet’s superior performance stems from three fundamental architectural innovations that address specific challenges in medical image segmentation: Multi-Scale Feature Extraction Through Sequential Pathways: The MultiPathMobileBlock’s effectiveness derives from its unique approach to capturing multi-scale features without the computational burden of traditional Inception modules. Unlike parallel multi-scale approaches that require independent pathways, our sequential design reuses intermediate computations while progressively expanding receptive fields (\(1\times 1 \rightarrow 3\times 3 \rightarrow 5\times 5 \rightarrow 7\times 7\) equivalent). This design specifically addresses the challenge in medical images where anatomical structures appear at vastly different scales—from fine cellular boundaries in microscopic imagery to large organ contours in radiological scans. Training-Time Feature Learning Enhancement: The reparameterization technique’s effectiveness extends beyond mere parameter reduction. During training, the multi-branch structure provides enhanced gradient flow and feature diversity through multiple parallel pathways. Each branch contributes different representational aspects: the K\(\times\)K branch captures primary spatial features, 1\(\times\)1 branches enable cross-channel information mixing, and 1\(\times\)1-AVG branches provide spatial context averaging. Dataset-Specific Adaptation Mechanisms: Our analysis reveals that RepSegNet’s architecture adapts differently across medical imaging modalities:
-
Dermatoscopic imagery (ISIC-2018): Shows smallest dependency on architectural complexity (2.6 F1 point degradation), reflecting moderate multi-scale requirements for lesion boundary detection
-
Endoscopic imagery (CVC-ClinicDB): Demonstrates intermediate architectural sensitivity (3.1 F1 point degradation), where both components contribute significantly to handling challenging lighting conditions
-
Microscopic imagery (SegPC-2021): Exhibits highest architectural dependency (3.3 F1 point degradation), where cellular-level precision demands maximum feature extraction sophistication
Critical analysis of architectural success factors
The superior performance of RepSegNet across medical imaging modalities reveals fundamental characteristics of the segmentation task itself. Medical images present a unique challenge: pathological structures occur at unpredictable scales, requiring simultaneous access to fine-grained texture information and broad contextual understanding. Traditional multi-scale approaches using parallel pathways address this through redundant computation, while our sequential design achieves equivalent coverage through pathway reuse. This efficiency gain proves critical in resource-constrained clinical environments. The effectiveness of reparameterization in medical imaging stems from a mismatch between training requirements and deployment constraints. Medical datasets remain relatively small (hundreds to thousands of images), requiring rich model capacity during optimization to prevent overfitting. However, deployment demands minimal computational overhead for real-time clinical integration. Reparameterization resolves this conflict by decoupling training complexity from inference efficiency—a solution particularly suited to medical imaging where data scarcity and deployment constraints both persist. The architecture exhibits clear performance boundaries. The progressive degradation pattern across imaging modalities (SegPC-2021> CVC-ClinicDB> ISIC-2018) indicates that performance gains correlate directly with multi-scale complexity requirements. For imaging tasks with limited scale variation or where objects appear at consistent sizes, the MultiPathMobileBlock architecture provides minimal benefit over simpler alternatives. Similarly, reparameterization offers negligible advantage in scenarios without stringent computational constraints or where model capacity during training proves sufficient. These observations suggest that architectural choices must align with task-specific characteristics rather than pursuing universal optimality. The success of RepSegNet derives from matching design decisions to medical imaging’s particular requirements: simultaneous multi-scale feature extraction, limited training data, and deployment efficiency constraints.
Despite its strengths, RepSegNet faces several limitations. The current model is optimized for 2D image segmentation, limiting its applicability in volumetric imaging modalities such as CT and MRI, which are prevalent in many clinical scenarios. Also, RepSegNet’s decision-making process lacks transparency, hindering trust and adoption in clinical practice where explainability is crucial. In addition, while the model performs well on the chosen datasets, these may not fully represent the diversity of real-world medical imaging scenarios, potentially limiting generalizability.
Future research should address three priorities: extending the architecture to volumetric imaging (CT, MRI, 3D ultrasound), integrating explainability mechanisms for clinical trust, and implementing dynamic complexity adjustment for diverse deployment scenarios. Clinical validation across broader patient populations and imaging protocols remains necessary. Multi-task extensions combining segmentation with classification or detection could enhance clinical utility.
Realizing clinical impact requires addressing identified limitations through volumetric extension, explainability integration, and validation across diverse patient populations and acquisition protocols. The model’s effectiveness demonstrates a broader principle: architectural decisions must align with task-specific characteristics rather than pursuing universal solutions. This framework—matching design to medical imaging’s scale variance, limited training data, and deployment constraints—informs development of efficient segmentation architectures beyond RepSegNet itself.
Data availability
The SegPC-2021 dataset is available through IEEE DataPort (https://ieee-dataport.org/open-access/s3-dataset). Access requires free IEEE account registration, but no membership fees. The complete source code for RepSegNet is publicly available on GitHub at https://github.com/rashidjuraev/RepSegNet. To ensure permanent accessibility and version control, the exact version of the code used in this study has been archived at Zenodo and assigned DOI: 10.5281/zenodo.17971714. The code is released under the MIT License with no restrictions to access.
References
Tajbakhsh, N. et al. Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation. Med. Image Analysis 63, 101693 (2020).
Litjens, G. et al. A survey on deep learning in medical image analysis. Med. Image Analysis 42, 60–88 (2017).
Joskowicz, L., Cohen, D., Caplan, N. & Sosna, J. Inter-observer variability of manual contour delineation of structures in CT. Eur. Radiology 29, 1391–1399 (2019).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015 9351, 234–241 (Springer, 2015).
Zhou, Z., Rahman Siddiquee, M. M., Tajbakhsh, N. & Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support 11045, 3–11 (Springer, 2018).
Schlemper, J. et al. Attention gated networks: Learning to leverage salient regions in medical images. Med. Image Analysis 53, 197–207 (2019).
Chen, J. et al. Transunet: Transformers make strong encoders for medical image segmentation. arXiv:2102.04306 (2021).
Liu, H., Huo, G., Li, Q., Guan, X. & Tseng, M.-L. Multiscale lightweight 3d segmentation algorithm with attention mechanism: Brain tumor image segmentation. Expert Syst. with Appl. 214, 119166 (2023).
Zhang, Q. et al. A comparative study of attention mechanism based deep learning methods for bladder tumor segmentation. Int. J. Med. Informatics 171, 104984 (2023).
Xu, Y. et al. Fpga oriented lightweight deep learning inference for liver cancer segmentation. In 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1–5 (IEEE, 2024).
Sahoo, P. K., Soltani, S. & Wong, A. K. A survey of thresholding techniques. Comput. Vision, Graphics, Image Processing 41, 233–260 (1988).
Adams, R. & Bischof, L. Seeded region growing. IEEE Trans. Pattern Anal. Machine Intell. 16, 641–647 (1994).
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Machine Intell. 679–698 (1986).
Kass, M., Witkin, A. & Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vision 1, 321–331 (1988).
Osher, S. & Sethian, J. A. Fronts propagating with curvature-dependent speed: algorithms based on hamilton-jacobi formulations. J. Comput. Physics 79, 12–49 (1988).
Li, S. Z. Markov random field modeling in image analysis (Springer Sci. & Bus, 2009).
Pham, D. L., Xu, C. & Prince, J. L. Current methods in medical image segmentation. Annu. Rev. Biomed. Engineering 2, 315–337 (2000).
Sharma, N. & Aggarwal, L. M. Automated medical image segmentation techniques. J. Med. Physics 35, 3 (2010).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (2012).
Oktay, O. et al. Attention u-net: Learning where to look for the pancreas. In Medical Imaging with Deep Learning (2018).
Howard, A. G. et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510–4520 (2017).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4510–4520 (2018).
Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, 6105–6114 (PMLR, 2019).
Yakubovskiy, P. Efficientnet: Improving accuracy and efficiency through automl and model scaling. arXiv:1905.11946 (2019).
Wang, J., Zhang, H., He, Y., Wang, W. & He, H. A lightweight deep learning model for automatic segmentation of hippocampus in mr images. Eng. Appl. Artif. Intell. 95, 103920 (2020).
Baheti, B., Innani, S., Gajre, S. & Talbar, S. Eff-unet: A novel architecture for semantic segmentation in unstructured environment. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1473–1481 (IEEE, 2020).
Ma, J., Zhang, L., Yang, Z., Huang, J. & Zhang, Y. Mobilesunet: A fast and efficient network for real-time semantic segmentation of high-resolution medical images. IEEE J. Biomed. Health Informatics 25, 3787–3797 (2021).
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Gibson, E. et al. Automatic multi-organ segmentation on abdominal ct with dense v-networks. IEEE Trans. Med. Imaging 37, 1822–1834 (2018).
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1–9 (2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2818–2826 (2016).
Jha, D. et al. Resunet++: An advanced architecture for medical image segmentation. In 2019 IEEE International Symposium on Multimedia (ISM), 225–2255 (IEEE, 2019).
Lou, A., Guan, S. & Loew, M. Dc-unet: rethinking the u-net architecture with dual channel efficient cnn for medical image segmentation. In Medical Imaging 2021: Image Processing 11596, 758–768 (SPIE, 2021).
Xu, G. et al. Levit-unet: Make faster encoders with transformer for biomedical image segmentation. Available at SSRN 4116174 (2022).
Yuan, W., Peng, Y., Guo, Y., Ren, Y. & Xue, Q. Dcau-net: dense convolutional attention u-net for segmentation of intracranial aneurysm images. Vis. Computing for Industry, Biomedicine, Art 5, 9 (2022).
Codella, N. et al. ISIC 2018: Skin lesion analysis towards melanoma detection. https://challenge2018.isic-archive.com/ (2018). Accessed: 2023-10-12.
Bernal, J. et al. CVC-ClinicDB, a database of colonoscopy videos for learning and inferring intelligence in endoscopy. J. Med. Imaging 4, 044110. https://doi.org/10.1117/1.jmi.4.4.044110 (2017).
Gupta, A. et al. SegPC-2021: Segmentation of Multiple Myeloma Plasma Cells in Microscopic Images. https://segpc-2021.grand-challenge.org/ (2021). Accessed: 2023-10-15.
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G. & Sun, J. RepVGG: Making VGG-style ConvNets great again. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13733–13742 (2021).
Li, X., Wang, Y., Zhang, Q., Chen, H. & Liu, J. RepMobileNet: Efficient neural network with reparameterizable mobile blocks. IEEE Access 10, 85998–86009 (2022).
Ding, X., Zhang, X., Han, J. & Ding, G. Diverse branch block: Building a convolution as an inception-like unit. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10886–10895 (2021).
Acknowledgements
This work was supported in part by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (RS-2025-00559998), and in part by the Regional Innovation System & Education (RISE) program through the Daegu RISE Center, funded by the Ministry of Education (MOE) and the Daegu Metropolitan City, Republic of Korea (2025-RISE-03-001). We also would like to thank the Kyungpook National University High- Performance Computing Center for providing computational resources.
Author information
Authors and Affiliations
Contributions
R.J. conceptualized the approach, implemented the model architecture, conducted all experiments, analyzed the results, and drafted the manuscript. J.-M.K. supervised the research, provided critical direction on methodology, and contributed to the interpretation of results. I.-M.K. provided theoretical insights on reparameterization techniques and reviewed the mathematical formulation. S.Y. contributed to the experimental design, assisted with performance evaluation strategies, and provided expertise on medical image analysis applications. All authors reviewed the manuscript and approved the final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Juraev, R., Kim, IM., Yun, S. et al. Efficient medical image segmentation using RepSegNet lightweight reparameterized neural network. Sci Rep 16, 4682 (2026). https://doi.org/10.1038/s41598-025-34973-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-34973-6







