Table 2 Detailed structure of the proposed HyperFusion-Net architecture.
Stages | Layers | Output size |
|---|---|---|
Input image | Input layer (resize to 224 × 224) | 224 × 224 × 3 |
Stem | Conv2D(64) @3 × 3 &1 + BatchNorm + ReLU | 112 × 112 × 64 |
MaxPooling2D @2 × 2 &2 | 56 × 56 × 64 | |
Transformer encoder | Multi-Path ViT with patch embedding (16 × 16), 4 encoder layers, and multi-head attention | 14 × 14 × 768 |
Positional Encoding + LayerNorm + FeedForward (MLP) | 14 × 14 × 768 | |
Segmentation branch | Attention U-Net Decoder with skip connections from patch embeddings | 224 × 224 × 1 |
Upsampling + Conv2D(128 → 64 → 32) + Attention Gates | ||
Fusion block | Cross-Attention Fusion between Transformer and U-Net outputs | 224 × 224 × 64 |
Classifier head | Global Average Pooling + Dense(128) + Dropout(0.5) + Dense(1) + Sigmoid | 1 |