Fig. 5

Illustration of the Architecture of EfficientNetV2, highlighting the use of Fused-MBConv layers for early-stage efficiency and MBConv layers for complex feature extraction. The comparison structure shows how Fused-MBConv replaces depthwise convolutions with standard convolutions, enhancing computational efficiency, while MBConv retains depthwise separable convolutions for capturing more detailed features.