Table 1 Comparison of Transformer-based Fusion Approaches.
Approach | Data Types | Attention Mechanism | Domain Adaptation | Industrial Validation |
|---|---|---|---|---|
Standard Transformer9 | Homogeneous sequences | Single-scale self-attention | General purpose | Limited |
Vision Transformer10 | Image + text | Patch-based attention | Computer vision | None |
TimesFormer11 | Video sequences | Spatial–temporal attention | Video analysis | None |
Industrial BERT12 | Text + numerical | Pre-trained embeddings | Manufacturing | Simulated data |
Proposed Method | Multi-modal heterogeneous | Multi-scale adaptive | Chemical engineering | Real-world deployment |