Table 1 A summary of the previous literature in hybrid CNN-Transformer models, their contributions, and drawbacks.
From: Enhancing artistic style classification through a novel ArtFusionNet framework
Study | Model | Limitations | Key Features | Proposed Innovation |
|---|---|---|---|---|
Zhang et al.17 | CNN-Transformer with Attention | Complex architecture, lower performance for abstract art | Attention mechanisms to refine feature extraction | ArtFusionNet enhances multi-scale feature extraction and fusion of all artistic styles. |
Liu et al.18 | Multi-scale CNN-Transformer | Increased computational load, limited performance across diverse datasets | Multi-level feature extraction through pyramid pooling | ArtFusionNet employs pyramid pooling and dilated convolutions to carry out scalable and efficient feature extraction. |
Huo et al.20 | Dual-band CNN-Transformer | High computational overhead, limited scalability | Simultaneous local and global feature extraction | ArtFusionNet performs local-global integration efficiently with no overhead using adaptive fusion. |
Zhang et al.21 | Dynamic Weighting CNN-Transformer | Struggles with balanced fusion and computational cost | Dynamic fusion of CNN and Transformer features | ArtFusionNet balances the feature fusion process by using an adaptive weighting mechanism. |