Table 3 Model Architecture Parameter Configuration.

From: Multi-source heterogeneous data fusion and intelligent prediction modeling for chemical engineering construction projects based on improved transformer architecture

Module Name

Parameter Settings

Function Description

Multi-Scale Attention

Scales: [1, 2, 4, 8], Heads: 8 per scale

Captures temporal dependencies at multiple resolutions

Cross-Modal Alignment

Projection dim: 512, Temperature: 0.1

Aligns features across different data modalities

Adaptive Gating

Hidden dim: 256, Dropout: 0.1

Dynamic weight allocation for data sources

Position Encoding

Max length: 2048, Sinusoidal type

Incorporates positional information for sequences

Feed-Forward Network

Hidden dim: 2048, Activation: GELU

Nonlinear transformation in each layer

Layer Normalization

Epsilon: 1e-6, Learnable params: True

Stabilizes training and improves convergence

Fusion Transformer

Layers: 12, Model dim: 768

Main processing backbone for integrated features

Output Projection

Classes: Variable, Activation: Softmax/Linear

Task-specific prediction head