Introduction

Handwriting character recognition is a classic theme in pattern recognition and computer vision. It has an important position in fields like robot vision, automated postal address interpretation, check processing, digitization of historical manuscripts, and learning tools1. Although large progress has achieved in recognition of Latin and Chinese scripts, recognition of Arabic script is still a challenging problem due to the cursive nature of the script, context sensitive letter-form variations, and differences in handwriting style2,3.

Table 1 Arabic alphabet characters used in the study. (Lists all 28 Arabic letters included in the dataset).

Arabic, being one of the most widely spoken languages in the world, comprises 28 basic letters that can take on different shapes depending on their position in a word (initial, medial, final, or isolated)4. The complexity of these variations poses a unique challenge to automated recognition systems. Table 1 shows Arabic alphabet characters used in the study.

Several previous studies have attempted to address Arabic handwritten character recognition using different techniques. For example, Al-Omari et al. applied Hidden Markov Models (HMMs) to recognize Arabic script and achieved moderate accuracy with constrained datasets5. Another study by Abandah and Khedher used structural and statistical features with a neural network classifier6.

More recently, researchers have explored the use of Convolutional Neural Networks (CNNs), which offer superior performance by learning spatial hierarchies directly from the data. For instance, Jaber et al. proposed a deep CNN model for recognizing isolated Arabic characters and achieved high accuracy on the AHCD dataset7. Similarly, El-Sawy et al. introduced a dataset and CNN model achieving promising results for printed and handwritten Arabic letters8. Recent works have introduced more advanced architectures such as Vision Transformers (ViT) and attention-enhanced CNNs. For example, the study adapted ResNet-based CNNs for Arabic script, achieving robust results with attention modules9. Similarly, the other study proposed an attention-integrated CNN for cursive Arabic scripts10. These studies highlight the growing trend toward hybrid models combining CNNs and transformer-based encoders.

Support Vector Machines and K-Nearest Neighbors represent traditional machine learning methods that depend on handcrafted feature extraction which results in error-prone performance and reduced robustness across diverse handwriting samples11. Through their direct learning from raw pixel data, CNNs have achieved extraordinary success in image classification and recognition tasks by developing hierarchical representations12,13. Recent developments in Arabic handwriting recognition have introduced advanced deep learning models, including HAT Former (2024)14, which employs a transformer-based encoder-decoder tailored for historical Arabic texts, and Qalam (2024)15, a multimodal large language model for OCR and handwriting. Similarly, hybrid models such as “Bridging the Gap” (2025)16 combine CNN and transformer layers to achieve high accuracy on IFN/ENIT, while KHATT (2024)17 and Arabic Handwriting Transformers (2025)18 offer segmentation-free recognition of connected scripts19. These approaches illustrate the field’s rapid evolution and highlight the need for lightweight, high-accuracy alternatives such as the model proposed in this study.

This study proposes a CNN-based approach to classify isolated handwritten Arabic characters. The system is trained on a dataset of digitized handwritten samples and evaluated using standard performance metrics. Our architecture is designed to optimize both accuracy and computational efficiency, aiming to contribute to the growing body of research in Arabic optical character recognition (OCR).

Table 2 Sample data matrix for handwritten Arabic characters. (Displays examples of annotated grayscale character images).

Methods

This study presents a Convolutional Neural Network (CNN) architecture established to classify isolated handwritten Arabic characters. The complete procedure includes data collection, preprocessing, model architecture design, training, and evaluation.

Data collection and preprocessing

The dataset includes 28 classes, each with 1000 grayscale samples of size 32 × 32 pixels. These samples were collected from 150 native Arabic writers across various age groups. Images were stored in PNG format and manually annotated. The dataset ensures uniform distribution and covers common handwriting variability. Table 2 presents a data matrix comprising grayscale images of individual characters which are annotated with their respective class labels19. Standard preprocessing techniques were applied including image resizing to a fixed resolution normalization of pixel values and data augmentation (e.g., rotation shifting and scaling)20 to enhance model generalization and robustness. The data augmentation development incorporated random rotational adjustments within ± 15° alongside horizontal and vertical shifts accomplishment 10% of image dimensions, while zoom scaling varied between 90% and 110%. These augmentations simulate common handwriting variability such as character slant, baseline shift, and diacritic displacement.

Fig. 1
figure 1

CNN architecture for Arabic handwritten character classification. (Depicts the structure of the proposed Convolutional Neural Network).

CNN architecture design

The proposed CNN architecture is illustrated in Fig. 1. It consists of multiple convolutional layers followed by max-pooling operations to progressively extract spatial features. ReLU activation functions are employed after each convolutional layer to introduce non-linearity. The extracted features are then passed through fully connected layers leading to a softmax output layer for classification. The architectural details, including the number of filters and kernel sizes for each layer, are summarized in Table 321.To further enhance feature discrimination for visually similar characters (e.g., ح/ج or س/ش), architectural refinements such as squeeze-and-excitation (SE) blocks can be integrated to dynamically emphasize diacritic-sensitive channels. Additionally, orientation-aware filters may help distinguish curvature nuances critical for dot placement and stroke directionality.

Table 3 Detailed CNN architecture specifications including layer types, kernel sizes, strides, paddings, output dimensions, and activations.

Model training

The CNN model was trained using the backpropagation algorithm with the Adam optimizer. The categorical cross-entropy loss function was used to evaluate model predictions. The model was trained over multiple epochs with mini-batch stochastic gradient descent, employing early stopping to prevent overfitting. Training and validation loss curves demonstrate the model’s convergence and learning stability22.

Implementation details

The model was written in TensorFlow 2.9 and trained on an NVIDIA RTX 3090 GPU. Training was carried out by using batch size = 32, learning rate = 0.001 with cosine annealing schedule, dropout = 0.5 for the dense layers and L2 regularization (weight decay = 0.001). Early stopping was employed using validation loss. Cross validation was performed with an 80/20 stratified split.

Evaluation metrics

Model performance was evaluated using standard classification metrics, including accuracy, precision, recall, and F1-score. A confusion matrix was generated to visualize classification performance across all character classes, as shown in Fig. 2. Comparative analysis with traditional machine learning models, including SVM and KNN, was conducted to benchmark the CNN’s effectiveness. Figure 5 illustrates the performance comparison, and Table 4 summarizes the quantitative results23. To interpret model misclassifications, gradient-weighted class activation maps (Grad-CAM) were computed for high-confusion character pairs, highlighting the spatial focus of the model on ambiguous regions (e.g., dot placements in ج vs. ح). Furthermore, controlled perturbations including random occlusions (partial glyphs), Gaussian blur, and contrast degradation were introduced to evaluate model robustness in degraded or real-world scanning conditions.

Fig. 2
figure 2

System flowchart for arabic character recognition algorithm. (Outlines the overall recognition pipeline from input to classification).

Statistical validation

To assess the statistical significance of performance differences, a paired t-test was conducted on the classification accuracies obtained from cross-validation folds.

Data augmentation

For better simulation of handwriting variation, future augmentation pipelines should include elastic to distort pen pressure and writing stretch, ink smudge artifacts with morphological filters to simulate aging documents, style transfer with CycleGANs to generate samples from different demographic (e.g. elderly or child handwriting) demographic, to increase robustness and generalization.

Results

The performance of the proposed Convolutional Neural Network (CNN) model was evaluated using a dataset of isolated handwritten Arabic characters. The results are presented in terms of classification accuracy, confusion matrix analysis, comparative performance with traditional models, and statistical significance testing.

Classification accuracy

The CNN achieved an average validation accuracy of 96.8% using 5-fold stratified cross-validation. The previously stated 85% accuracy was a preliminary result and has been corrected. Table 4 shows the final metrics after convergence. The model demonstrates its capacity to extract distinct features from handwritten character images while maintaining strong generalization across various handwriting styles through this performance. Table 4 presents a summary of the overall classification performance.

Table 4 Classification results of CNN, SVM, and KNN models. (Compares performance metrics across three classifiers).

Confusion matrix

Figure 3 presents the confusion matrix illustrating the model’s predictions versus the true labels across all Arabic characters. The matrix highlights the model’s high accuracy for most classes, with minimal confusion between characters that share similar visual structures. This suggests that the CNN is capable of distinguishing subtle differences in character morphology.

Fig. 3
figure 3

Confusion matrix of CNN predictions for Arabic character classes. (Shows prediction accuracy and common misclassifications across all classes).

Error analysis and robustness testing

To better understand the misclassifications observed in Fig. 3, Grad-CAM visualizations were generated for the five most frequently confused character pairs. As illustrated in Fig. 4, the model’s attention focuses heavily on ambiguous stroke regions, such as diacritic dots and terminal curves. This highlights that most classification errors arise from minor spatial misplacements in structurally similar glyphs.

Fig. 4
figure 4

Grad-CAM visualization for Top-5 misclassified character pairs. This figure illustrates Grad-CAM heatmaps for the five most commonly misclassified Arabic character pairs. Each column represents a character pair (e.g., ح/ج or س/ش), with the heatmaps highlighting regions of high model attention. Red areas indicate the most influential zones for classification, often focusing on stroke endings and diacritic positions that cause confusion.

As shown in Fig. 4 (Grad-CAM visualization), the CNN focuses heavily on upper and lower diacritic regions in character differentiation. For instance, misclassification of ح and ج appears due to slight dot displacement. A secondary experiment was performed where dots and small strokes were artificially shifted ± 2–3 pixels. Recognition rates decreased by 6–8% in response to diacritic location (sensitivity).

The images were further corrupted to emulate real application. These included blur, contrast reduction, and cropping 20% of the glyphs as similar like the possible segmentation errors. Even with the distortions, the CNN was still able to classify 85–89% of them accurately which shows robustness of the proposed CNN, but also has room for improvement under challenging when classifying occluded images.

Diacritic sensitivity analysis

To evaluate the sensitivity of the model to diacritic positioning—a known challenge in Arabic OCR—a controlled perturbation test was performed. Dots were artificially displaced from 1 to 5 pixels in multiple directions. As shown in Fig. 5, the CNN model experienced a gradual decline in accuracy, with performance dropping from 96.8% to 80.5% as dot displacement increased to 5 pixels. SVM and KNN suffered even steeper drops, underscoring the CNN’s relative robustness to diacritic variability.

Fig. 5
figure 5

Accuracy drop vs. dot displacement. The on the x-axis is the shift of the diacritic dot, which is artificially displaced by 0 to 5 pixels of the Arabic characters, and the vertical one on the y-axis is the classification accuracy drop of the CNN, SVM, and KNN model. The CNN model is also found to be generally more robust to dot displacement than standard classifiers, but achieving accuracy significantly below this 3-pixel offset threshold points out the significance of the relative position of the dots in Arabic handwriting recognition.

Robustness under degraded conditions

Further tests were performed on artificially degraded samples, to replicate in-use conditions. These comprised gaussian blur filter and smudge filters to simulate ink bleed, scan artifacts, and age. Examples of comparisons between original and degraded characters are shown in Fig. 6. Initial experiments indicate a decrease of 6–12% in classification accuracy in this setting, indicating the necessity of robustness-aware augmentation strategies for future iterations.

Fig. 6
figure 6

OCR degradation under blur and smudge. Examples of original (top row) and degraded (bottom row) Arabic written characters. The effects that degrade the images are Gaussian blurring and smudging to represent low-quality scans, or bleed of ink. These conditions severely reduce visual clarity, which makes the model difficult to learn robustness under the reality conditions, e.g., document aging and low quality acquisition.

Comparative analysis

In order to parallel the CNN model, its performance was compared beside two conventional machine learning algorithms, i.e., Support Vector Machine (SVM) and K-Nearest Neighbors (KNN). SVM and KNN models caused in classification accuracy rates of 85.3% and 82.1%, correspondingly. The CNN outperformed both models, demonstrating its superior capacity for learning hierarchical spatial features. A visual performance comparison is shown in Fig. 7. Recent studies included lightweight and transformer-based architectures to benchmark model performance. ResNet-18, together with MobileNetV2 variants, underwent fine-tuning on identical datasets, resulting in a performance score of 94. 5% and 93. 2% accuracy, respectively. The study CNN outperformed these while maintaining computational efficiency. These comparisons reinforce the CNN’s suitability for isolated character classification in constrained-resource environments.

Fig. 7
figure 7

Model accuracy comparison between CNN, SVM, and KNN.(Highlights performance differences among the three classifiers).

Training and validation curves

Recent studies included lightweight and transformer-based architectures to benchmark model performance. ResNet-18 together with MobileNetV2 variants underwent fine-tuning on identical datasets resulting in a performance score of 94. 5% and 93. 2% accuracy, respectively.

Fig. 8
figure 8

Training vs. validation loss curve across epochs. (Illustrates training dynamics and overfitting prevention).

Statistical significance

A paired t-test was conducted to statistically validate the performance improvements of the CNN model over traditional classifiers. The null hypothesis stated that mean classification accuracy showed no significant difference between models. The resulting p-value was < 0. 01 The CNN demonstrated statistically significant performance enhancement. The effectiveness of deep learning techniques emerges as a powerful solution for tackling the intricate challenges presented by handwritten Arabic character recognition.

Discussion

The results of this study approve the extremely high effectiveness of the presented Convolutional Neural Network (CNN) structure in identifying separated handwritten Arabic letters. Table 4 clearly shows the CNN model reached an average accuracy of 96. The performance of 8% stands significantly above the results obtained by traditional machine learning models like Support Vector Machine (SVM) and K-Nearest Neighbors (KNN), which both reached 85. 3% and 82. Accuracy levels stand at 1% for each case. Figure 8n presents a visual summary of performance improvement that demonstrates the CNN’s advanced classification capability. Previous research on Arabic handwriting recognition has often relied on handcrafted features or classical pattern recognition techniques. Al-Omari et al.5 used Hidden Markov Models (HMMs), achieving limited accuracy, particularly on larger or more variable datasets. Similarly, Abandah and Khedher6 combined structural and statistical features in a neural network, but their approach required extensive feature engineering.

In contrast, our CNN architecture (illustrated in Fig. 2) learns spatial hierarchies directly from raw input images, avoiding the limitations of manual feature extraction. This aligns with studies by Jaber et al.7 and El-Sawy et al.8, who also demonstrated strong CNN-based performance, albeit often with more computational overhead. Our CNN strikes a balance between accuracy and training efficiency, as evidenced by the smooth convergence shown in the training and validation loss curves in Fig. 8.

Furthermore, the confusion matrix in Fig. 7 provides deeper insight into the model’s classification behavior across the 28 Arabic characters (listed in Table 1). Misclassifications primarily occur among visually similar characters, which is consistent with known challenges in Arabic OCR systems24.

Although not directly related to OCR, advances in federated learning, blockchain-based privacy mechanisms, and adversarial resilience present future directions for secure deployment of handwriting models25,26,27.

Although our CNN’s performance is competitive on isolated Arabic characters, there is a recent trend in other works toward the use of more complex architectures. For example, HATFormer (2024)14 and Arabic Handwriting Transformers (2025)18 use attention-based approaches for full-line recognition. Context-based models such as transformer are too large and slow for such rollout so our approach, which is smaller and faster converging, is better suited for deployment in mobile or embedded devices.

Additionally, our work is complementary to current research indicating that during the design process for federated learning and secure model deployment. For example, “Gradient Shielding” (2023) explores adversarial resilience, and “DeepAutoD” (2021) proposes privacy-preserving distributed systems28,29. Although these studies target broader security applications, their methodologies are adaptable to Arabic OCR systems deployed in decentralized or privacy-sensitive contexts.

The dataset used in this study was carefully curated to ensure equal representation of all Arabic characters, and a sample structure of the collected data is shown in Table 2. Each character was digitized and normalized through preprocessing techniques to standardize input dimensions and enhance generalization. Figure 3 presents the entire algorithmic framework which details the data flow and classification pipeline from initial data acquisition through to final output classification.

Table 3 meticulously enumerates the CNN architecture specifics such as convolutional layer count along with filter dimensions and pooling methods. The model’s structured architecture enabled it to detect subtle character distinctions efficiently which proves essential due to Arabic letters’ contextual shape transformations4.

These additional visualizations and quantitative analyses—Grad-CAM maps (Fig. 4), dot displacement sensitivity (Fig. 5), and degradation stress tests (Fig. 6)—highlight several important weaknesses of existing CNN-based models for handwriting. Even if high accuracy is achieved on clean data, model performance drops on subtle diacritic shifts or scanning artifacts, mirroring the challenges encountered in deploying OCR systems in practice. These observations provide a route to improve architecture and learning for enhanced robustness.

The model demonstrates strong performance yet several limitations require acknowledgment. The dataset comprises individual characters instead of complete words or sentences. As such, the model does not currently handle segmentation or the complexities of connected cursive text—a common scenario in real-world handwritten Arabic documents28.

The model’s evaluation occurred using balanced clean data but its performance under noisy conditions and diverse image acquisition methods like smartphone photos and scanned manuscripts remains untested. The investigation fails to examine character recognition amidst diacritics and stylistic variations such as calligraphy or informal scripts which could affect practical implementation.

Table 5 Comparative analysis of the proposed CNN model and previous Arabic handwritten character recognition approaches.

To contextualize the effectiveness of proposed CNN model, Table 5 provides a comprehensive comparative analysis between our approach and prior state-of-the-art methods used for Arabic handwritten character recognition. The comparison spans key dimensions including feature extraction strategies, classification accuracy, generalization capability, statistical validation, training efficiency, model architecture, dataset characteristics, and multilingual adaptability. This structured evaluation highlights the practical and methodological advantages of the proposed system over earlier handcrafted and hybrid approaches.

The model’s evaluation occurred using balanced clean data but its performance under noisy conditions and diverse image acquisition methods like smartphone photos and scanned manuscripts remains untested. The investigation fails to examine character recognition amidst diacritics and stylistic variations such as calligraphy or informal scripts which could affect practical implementation. The model requires additional investigation to assess its performance across real-world applications including postal scanning and mobile OCR together with educational tablet input.

The introduced CNN framework can be modified for use with other languages having similar Arabic writing complexities including Persian Urdu and Pashto. These languages utilize cursive scripts where character shapes alter depending on contextual factors. Through retraining with suitable datasets this presented methodology has the potential to advance multilingual OCR systems while supporting cross-linguistic digitization projects within historical archives educational platforms and cultural preservation efforts30.

This research establishes a foundational platform for developing assistive technologies that integrate handwriting recognition with additional modalities like speech input and eye-tracking to aid users with disabilities in both educational and communication settings31-33.

Conclusion

This study presented a Convolutional Neural Network (CNN) model for the classification of isolated handwritten Arabic characters. Through a carefully designed architecture and robust training process, the model achieved a high classification accuracy of 96.8%, outperforming traditional machine learning approaches such as SVM and KNN. The results, supported by statistical significance testing, validate the effectiveness of deep learning in handling the complexity of Arabic script recognition.

By eliminating the need for manual feature extraction, the proposed CNN model demonstrates superior generalization across varied handwriting styles. The analysis of the confusion matrix and training dynamics confirms the model’s reliability and efficiency. The system’s modular design combined with its performance indicates potential scalability to advanced tasks such as connected script recognition and word-level classification.

The study examined isolated characters under controlled conditions but future research needs to tackle real-world issues including noisy input, cursive script, and diacritics. The approach expands to languages sharing identical morphological features including Persian and Urdu to support deep learning-based OCR systems across multicultural multilingual societies.

The system proposed in our research thus constitutes a breakthrough for the recognition of Arabic handwriting and will serve as a solid basis for future research work in both intelligent documents processing and linguistic perspectives.

The proposed system is a lightweight and flexible OCR component which is appropriate for embedded deployment into mobile scanning applications and postal automation pipelines. In future, the scalability of the system will be augmented for complete word recognition, joined-up handwriting recognition, and for running in real-time on smartphones using open-source OCR stacks such as Tesseract.