Table 1 A summary of the previous literature in hybrid CNN-Transformer models, their contributions, and drawbacks.

From: Enhancing artistic style classification through a novel ArtFusionNet framework

Study

Model

Limitations

Key Features

Proposed Innovation

Zhang et al.17

CNN-Transformer with Attention

Complex architecture, lower performance for abstract art

Attention mechanisms to refine feature extraction

ArtFusionNet enhances multi-scale feature extraction and fusion of all artistic styles.

Liu et al.18

Multi-scale CNN-Transformer

Increased computational load, limited performance across diverse datasets

Multi-level feature extraction through pyramid pooling

ArtFusionNet employs pyramid pooling and dilated convolutions to carry out scalable and efficient feature extraction.

Huo et al.20

Dual-band CNN-Transformer

High computational overhead, limited scalability

Simultaneous local and global feature extraction

ArtFusionNet performs local-global integration efficiently with no overhead using adaptive fusion.

Zhang et al.21

Dynamic Weighting CNN-Transformer

Struggles with balanced fusion and computational cost

Dynamic fusion of CNN and Transformer features

ArtFusionNet balances the feature fusion process by using an adaptive weighting mechanism.