Software application in early blight detection in tomatoes using modified MobileNet architecture

Appati, Justice Kwame; Wellu, Ziem Patrick; Amissah, Daniel Kwame; Boante, Leonard Mensah

doi:10.1038/s41598-025-24101-9

Download PDF

Article
Open access
Published: 25 January 2026

Software application in early blight detection in tomatoes using modified MobileNet architecture

Justice Kwame Appati¹,
Ziem Patrick Wellu¹,
Daniel Kwame Amissah¹ &
…
Leonard Mensah Boante¹

Scientific Reports volume 16, Article number: 3482 (2026) Cite this article

229 Accesses
Metrics details

Subjects

Abstract

This study presents an automated framework for early blight detection in tomato plants using a modified MobileNet architecture. Addressing the limitations of traditional labor-intensive methods, this study proposes a two-stage pipeline combining (1) transfer learning with depthwise separable convolutions for efficient feature extraction and (2) a meta-learned ensemble of Random Forest, SVM, and Gradient Boosting classifiers to handle real-world variability in lighting and environmental conditions. The approach introduces two custom convolutional layers (Custom_Feature_Extraction_Block) that improve F1-score by + 3.8 points over the MobileNet baseline, with the ensemble contributing an additional + 2.1 points. Evaluated on a balanced PlantVillage dataset (1,982 images) with extensive augmentation to simulate variable lighting and orientations, the system achieved up to 100% accuracy with selected classifiers on a held-out validation subset of 30 images under controlled conditions. To assess generalization, we further validated the framework on an independent dataset (tomato_dataset_v2, 30, 609 images, 10 classes) containing field-acquired tomato leaf images, where the model attained 94.5% accuracy, confirming robustness beyond control environments. Comparative analysis with 10 recent methods demonstrates superior accuracy-efficiency trade-offs, offering practical on-device decision support for smallholder farmers. The framework’s lightweight design (4.2 M parameters, 23 ms/image on Raspberry Pi 4) and validated scalability underscore its potential for mobile and drone-based agricultural deployment. This addresses critical needs in global food security through accessible plant disease detection.

Introduction

The historical evolution of plant disease detection techniques has transitioned from manual optical scrutiny by agronomists to contemporary technological advancements¹. Previous methodologies, while essential, were limited by subjectivity and the human eye’s inability to discern subtle discrepancies. Notable progress came with laboratory tests like microscopy and serological assays, marking a significant leap forward. However, these approaches proved time-intensive, requiring expert interpretation and creating bottlenecks in efficiently distinguishing common diseases². The severity of plant diseases and their impact on global food security have prompted the exploration of more systematic and automated solutions. The fusion of artificial intelligence and computer vision in modern times has revitalized plant disease detection, potentially revolutionizing the process³. This is particularly crucial given the projected global population surge to 9 billion by 2050⁴, and the need for innovative strategies to reduce yield losses and secure sustainable food sources. However, within these transformative possibilities lies the challenge of developing robust and resilient disease classification models⁵. Existing methodologies struggle with complexities inherent in real-world settings, such as variability in lighting, environmental conditions⁶, and diverse plant appearances. This research gap underscores the necessity for novel methodologies integrating deep learning techniques, custom feature extraction, and ensemble strategies to create dependable and robust disease detection systems⁷. In response to this backdrop, the study aims to bridge the research gap by presenting a comprehensive framework that blends the strengths of diverse techniques. Leveraging transfer learning, the study uses MobileNet as the foundational model to establish image representation. Custom convolutional layers (Custom_Feature_Extraction_Block1 and Custom_Feature_Extraction_Block2) are introduced to capture intricate disease-related features, while ensemble classifiers safeguard prediction accuracy. This research strives to rejuvenate automated plant disease detection systems, making them systematic, precise, and scalable. Such advancements significantly effect global food security and sustainable agriculture⁸. The overarching objectives are to enhance the efficiency of disease detection, overcome real-world complexities, and contribute to the global endeavor for food security. To address these goals, the study will explore specific research questions such as “How does the integration of custom convolutional layers with MobileNet improve feature extraction for early blight detection?” “What is the comparative performance of ensemble classifiers (Random Forest, SVM, Gradient Boosting) in classifying transitional features?” related to the effectiveness of the proposed framework in handling diverse environmental conditions and achieving high accuracy in disease classification. The proposed method integrates transfer learning, custom convolutional layers, and ensemble classifiers to create a holistic and resilient approach to automated plant disease detection.

Generic state of Art proposals and research interventions

Recent advancements in computer vision and deep learning have revolutionized disease detection and diagnosis across various domains, including agriculture. State-of-the-Art in Agricultural Vision Research on crop-specific disease diagnosis has accelerated⁹. MobileNet variants have been explored for potato and tomato blight¹⁰, EfficientNet for apple scab¹¹ and Fruit Disease Identification Based on Improved Densenet Fusion Defogging Algorithm¹². Unlike earlier studies, the present work (i) targets on-device inference, (ii) quantifies robustness across lighting, and (iii) provides a detailed ablation of architectural choices. PlaNet: a robust deep convolutional neural network model for plant leaves disease recognition,”¹³. Recent advancements in deep learning have significantly improved plant disease detection. For example: Liu & Wang¹⁴ proposes an early recognition method of tomato leaf spot based on MobileNetv2-YOLOv3 model to achieve a good balance between the accuracy and real-time detection of tomato gray leaf spot. Slimani, Mhamdi, & Jilbab¹⁵ proposed Drone-Assisted Plant Disease Identification Using Artificial Intelligence: A Critical Review, highlighting the efficacy of deep learning techniques in automating disease detection tasks.

However, in agriculture, there is a pressing need for specialized solutions to address crop diseases such as early blight in tomatoes. Early blight, caused by fungal pathogens, poses a significant threat to tomato crops, leading to substantial yield losses if not managed effectively. While existing deep learning models have demonstrated success in disease detection, a notable research gap exists in developing tailored architectures specifically designed for early blight detection in tomatoes.

In response to this gap, the proposed research aims to develop a “Modified MobileNet Architecture for Early Blight Detection in Tomatoes.” MobileNet, known for its efficiency and suitability for mobile and embedded devices, serves as the foundation for the proposed architecture. By adapting and optimizing MobileNet for early blight detection, the research addresses the unique challenges associated with identifying and diagnosing fungal diseases in tomato plants. Other research works such as^{16, 17} provide further insight into how far state-of-the-art models have advanced in computer vision.

Through this specialized approach, the research endeavors to contribute to the advancement of precision agriculture practices, enabling farmers to detect early blight in tomatoes accurately and efficiently. By leveraging the power of deep learning and tailoring the architecture to the specific characteristics of tomato leaf images, the proposed study aims to empower farmers with valuable tools for disease management and crop protection.

While several prior works rely on lightweight variants of MobileNet (e.g., MobileNetV3) or attention-augmented CNNs for plant disease detection¹⁸, these models often trade interpretability or ensemble flexibility for efficiency. Our framework departs from these approaches by introducing three key innovations: (1) the introduction of dual, task-specific Custom_Feature_Extraction_Blocks (with 3 × 3 and 5 × 5 convolutions) designed to capture both fine-grained textures and larger lesion patterns characteristic of early blight, which are often missed by generic feature extractors; (2) a meta-learned ensemble that dynamically weights classifiers based on validation performance to handle real-world variability, moving beyond simple averaging or voting; and (3) a holistic focus on the accuracy-efficiency trade-off validated through extensive ablation studies and independent testing on a large, field-based dataset (tomato_dataset_v2), which is less common in prior literature. This design enables improved generalization beyond what single lightweight architectures achieve, particularly under the variable lighting and texture conditions observed in smallholder farms.

Overall, the research bridges the gap between recent advancements in deep learning and the specific challenges faced in agricultural disease detection, offering a promising avenue for improving crop health monitoring and ensuring food security.

The remainder of the paper is organized as follows: “Related works” reviews related works; “Methods and materials” details the methodology; “Results and discussions” presents results, comparative analysis, and ablation; and “Conclusion” discusses limitations and future work.

Related works

In the early days of plant disease detection, the primary approach was manual optical scrutiny conducted by experienced agronomists. This method heavily relied on human observation, which was inherently subjective and limited by the human eye’s ability to discern subtle differences in plant health. Despite being the initial step toward understanding and identifying diseases, this approach lacked objectivity and scalability¹⁸. As technology advanced, the mid-days of plant disease detection witnessed a shift towards laboratory tests, such as microscopy and serological assays. These techniques marked a significant improvement, providing more objective results than manual observation. However, they proved to be time-intensive, requiring expert knowledge for interpretation. While these methods represented a leap forward in accuracy, their practicality in distinguishing common diseases efficiently was hindered, creating a bottleneck in large-scale disease identification¹⁹. In contemporary times, the fusion of artificial intelligence (AI) and computer vision has redefined plant disease detection. Automated techniques, leveraging deep learning models, have emerged as powerful tools for rapidly and accurately identifying diseases on a large scale. The use of AI has addressed the scalability issue posed by earlier methods, enabling more efficient and widespread disease detection³.

Despite progress, current methodologies face challenges inherent in real-world agricultural settings. Issues such as variability in lighting, environmental conditions⁶, and diverse plant appearances still obstruct accurate disease classification. The need for robust and resilient disease classification models persists. Some works have attempted to address these challenges by incorporating ensemble strategies, combining the strength of multiple classifiers to enhance prediction accuracy⁷.

Recent surveys²⁰ demonstrate AI’s transformative role in crop disease management, analyzing 2010–2021 research from IEEE/Scopus. Studies show AI improves detection accuracy (average + 22%) and scope (38 + diseases identified), though variability in weather conditions remains a challenge for model generalization. This highlights the need for robust architectures like our modified MobileNet.

In “A Robust Plant Leaf Disease Recognition System Using Convolutional Neural Networks” Diponkor, et al.,²¹ addressed the challenges of variability in lighting and diverse plant appearances by proposing a robust plant disease recognition model based on Convolutional Neural Networks (CNNs). The authors utilize transfer learning with a pre-trained CNN, enhancing the model’s ability to extract relevant image features. Additionally, ensemble techniques are employed, combining the predictions of multiple CNNs to improve classification accuracy in complex and dynamic agricultural environments.

Upadhyay & Gupta²² addressed a critical gap in fungal disease detection using improved ResNeXt (98.94% accuracy on apple crops). Their work emphasizes mycotoxin risks and the limitations of transfer learning (Inception-v7, ResNet underperformed), reinforcing our choice of Custom_Feature_Extraction_Blocks for feature extraction in tomato blight.

Focusing on the challenges of environmental conditions, Xu, Ding, Qiao, & Zhang²³, explore the use of crop prescription data to diagnose six common tomato diseases and pests accurately. The model is based on ensemble learning and multi-classification algorithms, and employs the recursive feature elimination method for feature selection. The dataset consists of 12,323 prescription records, with 2,607 records for tomato virus disease, 3,248 records for tomato late blight, 1,489 records for tomato gray mold, 2,061 records for aphid, 2,679 records for thrips, and 1,239 records for whitefly. The recursive feature elimination method based on gradient boosting decision tree (RFECV-GDBT) is employed to select 37 optimal features from the original 50 features, thus reducing the complexity of early data collection and the interference noise. The eight standard classification models used in this study are K-nearest neighbor (KNN), Decision tree (DT), Support vector machine (SVM), Random forest (RF), AdaBoost, Gradient boosting decision tree (GDBT), XGBoost, and LightGBM (LGBM). The Stacking-based model achieves an accuracy of 98.5% on the test set.

While ensemble strategies have improved robustness, there are still gaps in achieving consistently high accuracy across diverse conditions. For instance, recent works have focused on custom feature extraction in addition to deep learning techniques, aiming to capture intricate disease-related features that standard models may overlook. These efforts have indeed improved classification accuracy, but there is room for further refinement, especially in handling complex patterns associated with various environmental factors.

Innovative meta-learning approaches²⁴ enable maturity classification with limited mango data via cosine-distance adaptation. While focused on harvest timing, their segmentation-feature extraction pipeline validates our preprocessing methodology for disease localization in tomato leaves.

In the research on tomato leaf disease detection, Sanida, Sideris, Sanida, & Dasygenis²⁵, utilized effective pre-processing techniques. Image segmentation and feature extraction were key steps for precise disease identification. The K-means clustering algorithm was applied for image segmentation, facilitating the grouping of similar data points. Additionally, feature extraction methods, including Histogram of Oriented Gradients (HOG) and Local Binary Patterns (LBP), were employed to capture essential information from segmented images. These techniques aimed to enhance the system’s ability to identify distinct patterns associated with various tomato leaf diseases. The study highlights the significance of a robust pre-processing strategy in automating disease recognition in agricultural contexts.

In another similar study, the authors applied image segmentation and feature extraction techniques to locate and classify lung growth in CT images. They used the Otsu thresholding algorithm for image segmentation and the Local Binary Patterns (LBP) algorithm for feature extraction²⁶. They employed the U-Net model which comprises an encoder to extract features from the input image and a decoder that rebuilds the image from the extracted features. They used the median filter to reduce noise and intensive normalization to improve the contrast. For feature extraction, the researchers extracted shape, intensity and texture features and then used a Support vector machine classifier to classify the extracted features into two classes.

“Plant Disease Detection Using Deep Convolutional Neural Network” introduced an adaptive plant disease classification model based on Deep Convolutional Neural Networks (DCNNs). Pandian, et al.,²⁷ proposed a novel feature extraction mechanism that dynamically adjusts to changing environmental factors. They incorporate attention mechanisms within the DCNN architecture to prioritize relevant features, improving the model’s adaptability. The study showcases enhanced classification performance under varying conditions.

Unlike MobileNetV3, which optimizes for general-purpose mobile vision tasks, our custom blocks are explicitly tailored for the visual semantics of plant disease. Similarly, while attention mechanisms²⁹ improve feature weighting, they often increase computational cost. Our approach achieves a comparable boost in feature discrimination through strategically sized convolutional filters and a Swish activation in the second block, maintaining a lower operational footprint suitable for on-device deployment.

Recent studies using MobileNetV3 and attention-augmented CNNs have advanced lightweight modeling for plant disease detection. However, these models are typically optimized for clean benchmark datasets and do not explicitly address multimodal fusion or noise resilience³⁰. In contrast, our approach integrates custom MobileNetV2-based feature extraction with additional convolutional blocks and a meta-learned ensemble, offering superior adaptability and robustness. This distinction positions our framework as complementary yet novel compared to existing lightweight strategies.

Methods and materials

This section details the experimental framework, including: (1) dataset characteristics and preprocessing, (2) model architecture modifications, and (3) evaluation protocols. All experiments were conducted using TensorFlow with controlled environmental settings.

Dataset description and preprocessing

The experimental study employed the publicly available PlantVillage dataset (Gonzalez-Huitron et al., 2021), a standardized benchmark for plant disease detection research. For the tomato crop subset, the dataset comprised a balanced collection of 1,982 high-resolution RGB leaf images equally distributed between two critical classes: early blight infection (991 images) and healthy specimens (991 images), as depicted in Fig. 1.

The original images exhibited variable resolutions ranging from 256 × 256 pixels up to 1024 × 1024 pixels. To ensure consistency for model input, all images underwent center-cropping and uniform resizing to 224 × 224 pixels using bilinear interpolation, followed by pixel value normalization to the [0,1] range through division by 255. This standardized preprocessing facilitated optimal feature extraction while maintaining biological relevance of the visual patterns.

A comprehensive data augmentation strategy was implemented exclusively on the training set to improve model generalization. The augmentation pipeline included random rotations (0–90° range), width and height shifts (± 40% of image dimensions), both horizontal and vertical flipping, brightness adjustments (20–120% of original intensity), and zoom variations (± 40% magnification). These transformations effectively simulated field conditions where leaves may appear at different orientations, scales, and lighting conditions.

The dataset was partitioned into training (80%) and validation (20%) subsets, maintaining equal class distribution in each split. The validation set received only rescaling normalization without any augmentation to provide an unbiased evaluation of model performance. This rigorous approach to data preparation ensured the reliability of subsequent accuracy metrics while addressing potential overfitting concerns through the augmented training corpus.

The selection of this particular dataset and preprocessing methodology was based on three key considerations: (1) the PlantVillage collection represents the most widely-adopted benchmark in plant pathology imaging research, enabling direct comparison with prior work (see Fig. 1); (2) the balanced class distribution prevented model bias toward either healthy or diseased classification; and (3) the augmentation parameters were empirically tuned to reflect realistic agricultural imaging conditions while preserving diagnostically relevant features.

Independent validation dataset (tomato_dataset_v2)

This dataset comprises 30, 609 tomato leaf images across 10 classes (e.g., bacterial spot, early blight, healthy, late blight, leaf mold). It was cleaned, organized, and augmented to improve generalization. Preprocessing included resizing to 224 × 224, normalization, and data augmentation. Data was split into training (80%) and validation (20%) subsets.

Training dataset

The training dataset is vital for training the Early Blight detection model. It is generated using TensorFlow’s Keras ImageDataGenerator, which applies augmentation techniques like rescaling, rotation and flipping to increase the size and diversity of the dataset artificially. This augmentation enhances the model’s capacity to generalize across different conditions. The dataset contains images of both healthy tomato leaves and leaves affected by early blight.

Validation dataset

The validation dataset serves as an independent benchmark to assess the trained model’s performance. Like the training dataset, the validation dataset is also generated using ImageDataGenerator. It comprises images not seen during training. This dataset safeguards against overfitting and gauges the model’s capacity to generalize effectively to new, unseen examples of early Blight. Augmenting the training dataset enriches the model’s adaptability, while the validation dataset ensures reliability in real-world scenarios. Researchers optimize model hyperparameters and architecture by analyzing validation accuracy and loss to achieve early Blight Detection²⁸. The training and validation datasets collaboratively empower the model to learn and generalize, making them vital for accurate Early Blight identification.

Method

The primary dataset used for training and validation was the publicly available PlantVillage dataset, comprising 1,982 tomato leaf images equally split between “Early Blight” (991) and “Healthy” (991). Images were preprocessed via center-cropping, resizing to 224 × 224 pixels, and pixel normalization to [0,1]. To enhance generalization, comprehensive augmentation was applied to the pipeline (rotation up to 90°, width/height shifts ± 40%, horizontal and vertical flips, brightness scaling 0.2–1.2, and zoom ± 40%) on the training set. The dataset was partitioned into training (80%) and validation (20%) subsets with balanced class distribution.

To evaluate generalization beyond control settings, an additional validation was performed on an independent dataset: tomato_dataset_v2, which contains field captured tomato leaves exhibiting natural noise, background clutter and variable illumination. This independent validation enables a realistic assessment of the framework’s robustness.

Evaluation metrics

Evaluating the proposed method using standard metrics, including accuracy, precision, recall, and F1 score. Assess the model’s performance across different dataset categories to ensure balanced classification. Utilize confusion matrices and ROC curves to provide a comprehensive view of the model’s capabilities. Analyze the consistency of results to ensure the model’s generalization to unseen data. Perform hyperparameter tuning to optimize the model’s architecture and training parameters. Utilize techniques like grid search or Bayesian optimization to find the optimal set of hyperparameters. Compare the proposed method with existing state-of-the-art approaches.

Highlight the strengths and weaknesses of the proposed method in terms of accuracy, efficiency, and scalability. Validate the proposed method in real-world scenarios with varying lighting conditions and environmental factors. Assess its adaptability and resilience to challenges encountered in practical agricultural settings.

Model architecture

The proposed framework modifies the standard MobileNetV2 architecture to optimize feature extraction for early blight detection while maintaining computational efficiency. The base network retains MobileNetV2’s core structure a depth multiplier of 1.0 and 53 convolutional layers with ReLU6 activations but introduces three critical enhancements. First, the initial stride is reduced from 2 to 1 in the input layers to preserve fine-grained leaf texture patterns. Second, two Custom_Feature_Extraction_Blocks are appended: Custom_Feature_Extraction_Block1 employs a 3 × 3 convolutional operation with 32 filters, batch normalization, ReLU activation, and 0.5 dropout, while Custom_Feature_Extraction_Block2 uses a 5 × 5 convolution (64 filters) with Swish activation, chosen for its superior gradient propagation in deeper networks.

Thirdly, L2 regularization (λ = 0.01) is applied to all custom layers to mitigate overfitting.

The classification head replaces MobileNet’s original top layers with a GlobalAveragePooling2D layer followed by a 128-unit dense layer (ReLU activation) and a final sigmoid output. This adaptation reduces parameters by 18% compared to traditional fully connected heads while improving spatial invariance. Feature dimensionality transitions are illustrated in Fig. 2, showing how the input (224 × 224 × 3) is progressively transformed through inverted residual blocks (expansion factor 6) and custom layers into a 128-D discriminative embedding. These modifications collectively address the trade-off between model complexity and agricultural deployment constraints, as evidenced by the 97% validation accuracy achieved with only a 3.4% increase in FLOPs over baseline MobileNetV2.

The MobileNet architecture was selected as the base model due to its optimal trade-off between computational efficiency and feature extraction capability, making it suitable for deployment in resource-constrained agricultural environments. Prior benchmarking on the PlantVillage dataset compared MobileNet against ResNet50, EfficientNetB0, and DenseNet201. MobileNet achieved comparable accuracy (97% vs. 96.5% for ResNet50) with significantly lower inference time (180 ms vs. 320 ms for ResNet50) and memory footprint (16 MB vs. 98 MB for DenseNet201). This aligns with the study’s goal of balancing performance and practicality for real-world farm deployments. Custom layers (Custom_Feature_Extraction_Block1, Custom_Feature_Extraction_Block2) were introduced to augment MobileNet’s lightweight structure with task-specific feature extraction, addressing early blight’s subtle visual patterns.

This custom architecture integrates a pre-trained MobileNet base model with custom-designed top layers and auxiliary post-processing strategies for feature extraction and evaluation using ensemble classifiers.

Base model selection MobileNet is employed as the backbone model due to its lightweight architecture, making it well-suited for mobile and resource-constrained environments. It offers an optimal trade-off between computational efficiency and performance.

Custom top layers (feature extraction) Two specialized feature extraction blocks, named Custom_Feature_Extraction_Block1 and Custom_Feature_Extraction_Block2, are appended atop the MobileNet base. These Blocks consist of convolutional operations with specific filter sizes (3 × 3 and 5 × 5), chosen to capture both fine-grain textures and larger lesion patterns characteristic of early blight. Each block includes batch normalization, a Swish/ReLU activation, max pooling, and dropout mechanisms. A GlobalAveragePooling2D layer is utilized to reduce spatial dimensions, followed by a Dense layer with sigmoid activation for binary classification. This setup enables the model to extract secondary, task-specific features not captured by the base model.

Custom_Feature_Extraction_Block1 employs a 3 × 3 convolutional layer with 32 filters, followed by max pooling and dropout.
Custom_Feature_Extraction_Block2 applies a 5 × 5 convolutional layer with 64 filters, also followed by max pooling and dropout.

These configurations are tailored for image classification tasks, with filter sizes and depths chosen to extract hierarchical features effectively. The inclusion of dropout layers helps mitigate overfitting.

Data augmentation: To enhance generalization and increase data variability, the model leverages ImageDataGenerator to perform on-the-fly data augmentation, applying the following techniques.

Rescaling (1/255): Normalizes pixel values to the [0, 1] range, facilitating faster and more stable convergence by avoiding gradient instability.
Shear Range (0.4): Introduces shear transformations to make the model robust to angular distortions.
Zoom Range (0.4): Simulates scale variations in objects, enabling scale-invariant feature learning.
Rotation Range (90°): Allows the model to learn from images with diverse orientations—critical when object orientation is variable.
Width and Height Shift Range (0.4): Enables recognition of spatially shifted objects, improving translation invariance.
Brightness Range (0.2 to 1.2): Adjusts image brightness to simulate varying lighting conditions.
Horizontal Flip (True): Enhances generalization by presenting mirrored versions of images.
Vertical Flip (True): Further diversifies orientation, especially valuable in domains where objects appear in multiple alignments.

These augmentation strategies introduce meaningful variability into the training set, helping the model generalize better to real-world conditions. All transformations are applied dynamically during training to maximize their effectiveness as described in Table 1 below.

Model compilation & performance The model is compiled using binary cross-entropy as the loss function, the Adam optimizer, and evaluation metrics such as accuracy, precision, recall, and F1-score. Binary cross-entropy is appropriate for binary classification tasks, as it measures the divergence between predicted probabilities and actual labels, which is essential for models designed to output class probabilities. The Adam optimizer, known for its adaptive learning rate, momentum, and bias correction, ensures efficient and stable convergence during training. To further improve learning dynamics, a learning rate scheduler is implemented to systematically decrease the learning rate after every five epochs, allowing the model to fine-tune its learning and enhance convergence in later training stages. Following training, the model extracts feature representations from the MobileNet base for each image in the validation set, consistent with standard transfer learning approaches where pre-trained networks serve as feature extractors for downstream tasks.

To assess the quality and generalization of the learned features, several ensemble classifiers—including Random Forest, Support Vector Machine (SVM), and Gradient Boosting—are applied. These classifiers are evaluated using key performance metrics such as accuracy, precision, recall, and F1-score. Employing multiple classifiers offers a broader understanding of the model’s adaptability and robustness across different decision boundaries. For interpretability, the model visualizes filters from its initial convolutional layer, providing insight into the types of patterns being detected early in the network. Additionally, the t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm is used to project high-dimensional feature vectors into two dimensions, enabling intuitive visualization of how well features are clustered and separated by class.

The model also generates predictions on individual validation images and presents a detailed classification report, summarizing performance across all major metrics. This detailed feedback helps identify how the model performs across varying scenarios and image types. In summary, the custom architecture leverages transfer learning with MobileNet, integrates custom top layers for specialized feature extraction, utilizes data augmentation to promote robustness, applies multiple classifier evaluations for generalization assessment, and incorporates visualization techniques for model interpretability. The chosen design principles and implementation strategies align with best practices in computer vision and deep learning, aiming for optimal performance while maintaining computational efficiency.

Table 1 Summary of preprocessing and augmentation steps.

Subjects

Abstract

Introduction

Generic state of Art proposals and research interventions

Related works

Methods and materials

Dataset description and preprocessing

Independent validation dataset (tomato_dataset_v2)

Training dataset

Validation dataset

Method

Evaluation metrics

Model architecture

Meta-learned ensemble strategy

Results and discussions

Experiment 2: enhanced model

Experiment 3: data modification & parameter fine-tuning

Validation on independent dataset

Comparing custom model to base MobileNet model

Ablation study: component-wise analysis of modified MobileNet

Objective

Ablation study methodology and results

Comparing custom model to other models

Fairness in performance comparison

Further comprehensive comparison with state-of-the-art models in the field

Model explainability using SHAP and Grad-CAM

Discussions

Conclusion

Research performance snapshots

Experiment results and model performance

Experiment one

Experiment two

Exceptional performance in experiment three

Limitations and future recommendations

Deployment potential

Remarks

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links