Table 3 Model Architecture Parameter Configuration.
Module Name | Parameter Settings | Function Description |
|---|---|---|
Multi-Scale Attention | Scales: [1, 2, 4, 8], Heads: 8 per scale | Captures temporal dependencies at multiple resolutions |
Cross-Modal Alignment | Projection dim: 512, Temperature: 0.1 | Aligns features across different data modalities |
Adaptive Gating | Hidden dim: 256, Dropout: 0.1 | Dynamic weight allocation for data sources |
Position Encoding | Max length: 2048, Sinusoidal type | Incorporates positional information for sequences |
Feed-Forward Network | Hidden dim: 2048, Activation: GELU | Nonlinear transformation in each layer |
Layer Normalization | Epsilon: 1e-6, Learnable params: True | Stabilizes training and improves convergence |
Fusion Transformer | Layers: 12, Model dim: 768 | Main processing backbone for integrated features |
Output Projection | Classes: Variable, Activation: Softmax/Linear | Task-specific prediction head |