Table 1 A summarized view of relevant studies focused on maize disease classification.

From: Enhanced residual-attention deep neural network for disease classification in maize leaf images

Research Papers

Model

Pre-processing

Augmentation

Remark

[15]

Attention-CNN

resizing to 168 × 168

No

The model has 3 residual modules, 3 convolutions, and a GAP layer, followed by a SoftMax.

[17]

Attention-based DenseNet

Filtering, resizing to 224 × 224, edge filling, sharpening

Yes

Usage of depth-separable convolutions in dense blocks along with the attention mechanism

[22]

VGG + Inception

Filtering, resizing to 224 × 224, sharpening

Yes

The final convolutional layers of VGG were replaced with a convolution layer, batch normalization, and Swish activation.

[23]

VGG16

-

No

Using Otsu threshold segmentation, they differentiated the images into two categories: pixels with bright intensity and pixels with darker intensity.

[24]

Modified DenseNet

-

Yes

A dense block layer contains the Batch normalization layer, ReLU activation, conv(3 × 3), and Dropout. The layer amid two dense blocks performs downsampling.

[25]

Modified Inception-v3

resizing to 256 × 256

Yes

Three different Inception-v3-based models are developed

[20]

EffcientNetB0 + DenseNet121

resizing to 244 × 244

Yes

Merged features from multiple pre-trained CNNs using a concatenation technique.

[26]

VGG16, InceptionV3,

ResNet50, Xception

resized to 224 × 224

, 299 × 299, 96 × 96 for different models

Yes

Tuned hyperparameters using Bayesian Optimization

[27]

TCI-AlexN

-

Yes

The model improves AlexNet including a 3 × 3 × 256 convolution after the last layer for pooling.

[29]

CNN trained from scratch

Cropping, expanding, mirroring

Yes

The potential areas for recognition are removed by sharing the features of the transmission module layer by layer.

[33]

CNN trained from scratch

-

No

Used image

feature of Neuroph studio for model training

[34]

CNN trained from scratch

resizing to 224 × 224, rescaling

Yes

The finest model featured an image of 224 × 224 size, a batch of 32 samples, a 3 × 3 kernel size, and a train-test split of 80:20.

[35]

CNN trained from scratch

resizing to 227 × 227, rescaling

Yes

With three times fewer parameters to train, the model showed a 3.2% rise in prediction accuracy compared to the top-performing pre-trained network.

[36]

CNN trained from scratch

resizing to 224 × 224

Yes

The model has three layers of convolution, each linked to a pooling layer followed by the matching dense layers.

[37]

Plant-Xvit

-

No

The Conv2D blocks of the VGG and Inception model, with the ViT elements MLP and MHA with linear estimates, are the principal components.