Introduction

Research background and motivations

With the advancement of educational informatization, artificial intelligence (AI) has been increasingly applied in personalized learning and automated assessment1,2. However, most existing studies focus on general text classification or sentiment analysis and rarely address the closed-loop process of text adaptation, student proficiency mapping, and real-time feedback in high school English reading instruction3,4. Compared with large pre-trained Transformer models such as the BERT series, which require high computational resources, Text Convolutional Neural Network (Text CNN) offers natural advantages in computational efficiency, real-time performance, and sensitivity to local semantics (phrases or sentence structures), making it easier to deploy and iterate in resource-constrained classroom environments5. Moreover, integrating Text CNN with interpretability modules (e.g., key term and phrase visualization) and instructional strategies can provide teachers with actionable guidance, forming a teaching loop that connects model output, teacher intervention, and data feedback. Based on these motivations, this study proposes and validates a Text CNN application framework for high school reading instruction, focusing on three key aspects: automatic difficulty assessment of materials, personalized recommendations, and interpretable feedback. Comparative experiments were conducted in real classroom settings to evaluate the framework’s practical effectiveness.

Research objectives

The main objective of this study is to enhance the effectiveness of high school English reading instruction using the Text CNN model, focusing on three key aspects:

  1. 1.

    Content adaptation Employing Text CNN to classify reading materials and extract keywords, thereby providing personalized learning content tailored to students’ reading levels.

  2. 2.

    Learning outcome evaluation Conducting a comparative experiment to assess the effectiveness of the Text CNN-assisted tool in improving students’ reading comprehension.

  3. 3.

    Teaching efficiency optimization Investigating the use of Text CNN for automated assessment to reduce teachers’ workload and improve overall teaching efficiency.

To achieve these objectives, a comparative experiment was designed involving 60 high school students, who were divided into an experimental group and a control group. The experimental group used the Text CNN-based assistive tool alongside traditional teaching methods, while the control group relied solely on conventional textbooks and teacher guidance. Pre- and post-test comparisons, along with model performance evaluations, were conducted to validate the effectiveness of Text CNN in English reading instruction. Furthermore, this study provides both theoretical foundations and practical guidance for the development and optimization of future educational technologies.

Literature review

In recent years, deep learning-based text analysis techniques have received widespread attention for their applications in language processing and educational settings. Qin and Irshad (2024) developed an English textbook readability evaluation method based on the Text CNN model, providing an innovative solution for reading instruction. Their study demonstrated that selecting learning materials according to students’ reading abilities not only enhanced reading interest and comprehension but also achieved high evaluation accuracy (90%) on a self-constructed dataset, offering a scientific basis for English instructional design6.

In terms of reliability assessment and predictive analysis methodologies, recent studies have offered valuable references for evaluating the robustness of educational technology systems. Shehadeh et al.7 proposed a multi-state system reliability assessment method based on interval universal generating functions, using Dempster–Shafer theory and interval analysis to estimate system reliability under uncertainty7. This approach captures potential system states and their likelihoods through interval-valued belief functions, providing a refined perspective for complex system safety evaluation. Similarly, Shehadeh and Alshboul8 applied advanced ensemble machine learning algorithms to building safety prediction, achieving a prediction accuracy of 98.13% with a modified decision tree model, demonstrating the advantages of ensemble learning in complex environments8. These methodologies offer useful guidance for reliability assessment and predictive analysis in educational technology systems.

In the field of natural language processing, Khan et al.9 employed a Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) architecture for multilingual sentiment analysis in English and Roman Urdu texts. Their model performed well across multiple corpora, illustrating the complementary strengths of CNN and LSTM9. Susandri et al.10 further improved sentiment classification accuracy to 88% using a hybrid Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN-BiLSTM) model, highlighting the practical potential of deep learning for large-scale sentiment analysis10.Regarding model interpretability, Ce and Tie11 proposed a backtracking analysis method for CNN-based text classification models and visualized results on the IMDb dataset, providing new approaches for enhancing transparency and trust in text classifiers11. Ren et al.12 introduced Dynamic Label Alignment Strategy (DyLas) to address dynamic label changes in large-scale multi-label text classification using large language models. This method proved effective in domains such as e-commerce, news, medical coding, and legal documents12.In reference parsing and semantic modeling, Yin and Wang13 proposed the Contrastive Prompt-based Parser for References (CONT_Prompt_ParseRef) model, which demonstrated strong robustness even in low-resource settings13. Jiang et al.14 developed Feature Fusion and Multi-branch Graph Convolutional Network (Fpa-GCN) to improve the extraction accuracy of aspect-based sentiment triples through multi-branch graph convolution and feature fusion, showing the advantage of graph structures for capturing fine-grained contextual relations14. Chen et al.15 further proposed the Graph Cross-correlation Recommendation (GCR) method, which modeled cross-correlations between user and item subgraphs, achieving superior performance in recommendation tasks compared to mainstream approaches15.

In addition, recent years have seen a deepening of cross-disciplinary research on the integration of AI and complex system decision models, providing new reference paths for the reliability and interpretability of intelligent models in education. Shehadeh et al.7 proposed an integrated management framework combining digital twin technology with machine learning algorithms to achieve dynamic coordination within urban “water–energy–food–environment” systems. Through 3D modeling and real-time data-driven simulation mechanisms, the study enabled visualized prediction and optimized management of resource allocation, offering quantitative support for sustainable development goals. This framework emphasizes the role of AI in multi-source data fusion and feedback-driven decision-making, providing valuable insights for the design of learner behavior modeling and instructional feedback loops in educational intelligent systems16. Moreover, Shehadeh et al.7 introduced a predictive analytics approach based on an improved Extreme Gradient Boosting (XGBoost) algorithm for conflict detection in Building Information Modeling (BIM). This method significantly enhanced classification and prediction accuracy across multi-structural information, demonstrating the high robustness of machine learning models in complex semantic environments17. In addition, Shehadeh and Alshboul8 integrated virtual reality with machine learning to construct an intelligent framework for proactive detection and collaborative design, effectively reducing design conflicts by 16% and project duration by 12%. This study highlights AI’s potential in interactive visualization and human–machine collaborative optimization, and its “real-time feedback–adaptive improvement” paradigm bears clear parallels to adaptive content recommendation mechanisms based on learner responses in educational contexts18. Further, Alshboul et al.19 proposed a quality management decision framework based on evidence reasoning and belief functions, employing the Dempster combination rule to integrate multi-source information, enabling dynamic updates and accurate judgment under uncertainty19. This approach provides a mathematical foundation for building AI-based teaching models capable of self-learning and dynamic adjustment, particularly valuable when handling fuzzy labels and multivariate features in educational text analysis.

Overall, these studies collectively reflect the latest advances of AI in predictive analytics, data fusion, and uncertainty management within complex systems. They not only expand AI’s application boundaries in engineering and management but also offer cross-disciplinary methodological support for educational technology research. Building on these insights, this study develops a Text-CNN-based intelligent reading instruction model that enhances a data-driven instructional feedback system. The model facilitates the transition of AI from simple “model optimization” to system-level intelligence and improved instructional interpretability.

Research methodology

Adaptation of text CNN for reading instruction tasks

CNN was initially developed for computer vision tasks, but their variant for natural language processing, known as Text CNN, has been shown to efficiently capture local contextual features in text20,21,22. Compared with large-scale pre-trained models with complex architectures, such as the BERT series, Text CNN can achieve high performance in text representation and classification while maintaining a smaller model size, faster inference speed, and lower training cost. These characteristics make it particularly suitable for resource-constrained educational scenarios, such as real-time classroom feedback systems and learning platforms requiring rapid text analysis.

The core idea of Text CNN is to apply convolutional filters of varying sizes over sequences of embedded word vectors to capture n-gram features at different scales. For example, a filter size of 2 can extract phrase-level collocations, while sizes of 3 or 4 can capture more complex semantic patterns. These local features are then aggregated through pooling layers to form a compact representation of the entire text. Unlike traditional methods, Text CNN directly models local dependencies between words, better reflecting sentence-level semantics within the broader discourse context.

In English reading instruction, students often face challenges such as inaccurate assessment of text difficulty, unclear identification of key terms, and insufficient personalized exercises. To address these issues, this study adapts and extends the standard Text CNN architecture for educational tasks:

  1. 1.

    Multi-scale convolution design Parallel convolutional filters of sizes {2, 3, 4} are used to extract features at the phrase, syntactic fragment, and complex sentence levels, enabling the model to capture linguistic information at the word, sentence, and paragraph scales.

  2. 2.

    Attention-based pooling mechanism Replacing traditional max pooling, an attention-weighted strategy highlights key words and sentences relevant to reading comprehension, enhancing model interpretability and allowing teachers to intuitively identify areas of difficulty for students.

  3. 3.

    Teaching difficulty mapping module A difficulty-level mapping layer is added after convolutional representation, aligning textual features with Common European Framework of Reference for Languages (CEFR) levels or a custom difficulty system. This enables automatic assessment of appropriate reading material levels.

  4. 4.

    Personalized recommendation interface By integrating students’ historical performance and error patterns, model outputs are connected to a recommendation module that dynamically delivers adaptive exercises, forming a closed loop of “model assessment → recommended resources → student feedback.”

Through these adaptations, Text CNN not only provides efficient text representation and classification but also deeply aligns with instructional needs, balancing real-time performance, interpretability, and personalization. In high school English reading instruction, the model can assist teachers in selecting appropriate materials while providing students with exercises tailored to their proficiency levels, demonstrating the practical value of AI technologies in educational settings.

Text CNN-based english reading material evaluation model

The construction of a model evaluation typically involves splitting the dataset into two parts: one for training the model and the other for evaluating its performance23. During the training phase, the primary task is to learn the core patterns related to readability assessment from labeled texts, which form the foundation of the model. Once trained, the model can be applied not only to texts within the dataset but also to new, unseen texts, accurately predicting their readability levels. In deep learning-based text evaluation models, the main objective is to establish a mapping from a text dataset \(\:D=\{{d}_{1},{d}_{2},...,{d}_{n}\}\) to corresponding readability levels \(\:{l}_{i}=\{{G}_{1},{G}_{2},...{G}_{m}\}\)24,25,26,27. Typically, the model consists of three components: text data representation, feature extraction, and classification. Figure 1 illustrates the overall structure of the text evaluation model.

Fig. 1
Fig. 1
Full size image

Text evaluation model.

First, the raw text undergoes preprocessing, including tokenization, stop-word removal, and word vector embedding. The processed text is then fed into the Text CNN network, which employs multi-scale convolutions and attention-based pooling to extract both local and global semantic features. Subsequently, a teaching adaptation layer maps the convolutional representations to reading difficulty levels and key terms, which are further linked to a personalized recommendation interface. After the model outputs the classification results, it is trained using cross-entropy loss and the Adam optimizer. Techniques such as Dropout, L2 regularization, and early stopping are applied to reduce the risk of overfitting. Finally, model performance is evaluated using K-fold cross-validation and an independent test set, while confidence intervals and confusion matrices are employed to assess the robustness of the results.

The implementation of text representation is a crucial prerequisite for performing text readability evaluation tasks using DL. During the text data representation phase, the Word2vec tool is commonly used to generate word vectors28. Word2vec efficiently converts words into numerical vectors, and expresses the characteristics of these words in the form of vector space. The core idea is to train word vectors of specific dimensions using a DL model. By calculating the distance between word vectors, the similarity between words can be measured. By inputting a sequence of words \(\:[{w}_{1},{w}_{2},...,{w}_{m}]\) into Word2vec, the corresponding word vectors \(\:\mathbf{X}=[{\mathbf{x}}_{i},{\mathbf{x}}_{2},...,{\mathbf{x}}_{n}]\) are obtained. \(\:{\mathbf{x}}_{i}\in\:{\mathbb{R}}^{d}\) and n suggests the sequence length. For a sentence with a length of n (padding can be applied if necessary), its representation is as follows:

$$\:{\mathbf{X}}_{1:n}={\mathbf{x}}_{1}\oplus\:{\mathbf{x}}_{2}\oplus\:...\oplus\:{\mathbf{x}}_{n}$$
(1)

\(\:\oplus\:\) denotes the concat operation. Each row represents a Word2vec vector of a word, and vertically, these rows are arranged according to the order of the words in the sentence. The input data size is \(\:n\times\:k\). n suggests the word quantity in the sentence with the longest length in the training data, typically set to 64, and k represents the embedding dimension, typically set to 300.

During the feature extraction process, the CNN model is chosen as the encoder. First, the word vector representation X of the input sequence is obtained. With the purpose of extracting meaningful features from the sequence, different sizes of convolutional kernels are applied to the word vector sequence X for convolution operations29,30. By using multiple kernel sizes, relationships between words in different ranges of the sequence can be effectively captured, resulting in more representative feature representations. It is supposed that a convolution kernel has a size of k, and this kernel will operate on a segment of the sequence of length k, extracting local contextual information through a sliding window mechanism.

Next, the convolution operation generates feature maps, where each feature map represents the response degree of a specific pattern in the sequence. In order to improve the generalization ability of the model and focus on important features, a non-linear activation function and pooling operations are usually added after the convolutional layer31,32,33. The non-linear activation function helps capture complex patterns in the data, while pooling operations reduce the feature map size, decrease computational complexity, and suppress noise. As the convolutional kernel moves, a window matrix \(\:{\mathbf{W}}_{i}=[{\mathbf{x}}_{i},{\mathbf{x}}_{i+l},...,{\mathbf{x}}_{i+k-l}]\) containing k consecutive words is formed at each position i in the sequence. Then, the window matrix \(\:{\mathbf{W}}_{i}\) is subjected to convolution with the convolution kernel matrix M, generating a feature map \(\:\mathbf{C}\in\:{\mathbb{R}}^{L-k+1}\). The feature mapping of the word window vector w at position i is calculated as follows:

$$\:{\mathbf{c}}_{i}=\sigma\:(\mathbf{w}\otimes\:\mathbf{m}+b)$$
(2)

denotes multiplication, b represents the bias term, and \(\:\sigma\:\) denotes the sigmoid activation function. Through these calculations, the feature map is obtained as follows:

$$\:\mathbf{C}=[{\mathbf{c}}_{i},{\mathbf{c}}_{2},...,{\mathbf{c}}_{n-h+l}]$$
(3)

Next, max pooling is applied to the results obtained from the convolution operation, as shown in the following equation:

$$\:\widehat{\mathbf{c}}=max\left\{\mathbf{C}\right\}$$
(4)

The max pooling operation selects the maximum value from \(\:{\mathbf{c}}_{i}\) as the feature representation of the i-th word for that particular convolutional kernel.

For regularization, dropout is applied at the second-to-last layer \(\:\mathbf{Z}=[{\widehat{\mathbf{c}}}_{i},{\widehat{\mathbf{c}}}_{2},...,{\widehat{\mathbf{c}}}_{m}]\), and an \(\:{l}_{2}\) norm constraint is applied to the weight vector. The equation is as follows:

$$\:\mathbf{y}=\mathbf{w}\cdot\:(\mathbf{z}\circ\:\mathbf{r})+b$$
(5)

\(\:\circ\:\) represents the element-wise multiplication operator, and \(\:\mathbf{r}\) denotes the masking vector of the Bernoulli random variable with a probability p of 1. Additionally, after each step of gradient descent, if \(\left\| {\mathbf{w}} \right\|_{2} > s\), the weight vector \(\:\mathbf{w}\) is rescaled to satisfy \(\left\| {\mathbf{w}} \right\|_{2} > s\), thereby applying the \(\:{l}_{2}\) norm constraint to the weight vector.

In the final step, the chosen features are forwarded to a dense Softmax layer for classification. In the text classification component, logistic regression is utilized to build a multi-class classifier, where the input vector corresponds to the feature vector generated by the CNN34. The final representation vector, v, is derived from the preceding Pooling layer and subsequently passed to the Softmax layer for classification. The equation is as follows:

$$\:Softmax\left({z}_{i}\right)=\frac{{e}^{{z}_{i}}}{\sum\:_{c=1}^{C}\:{e}^{{z}_{c}}}$$
(6)

The output value of the i-th node is denoted as \(\:{z}_{i}\), while C represents the total number of output nodes, which corresponds to the number of categories the nodes are classified into35,36,37,38. The Softmax function is used to transform the input values of a multi-class classification problem into a probability distribution within the range [0, 1], where the sum of all probabilities equals 1. The Text CNN model employs cross-entropy as the loss function, and its equation is as follows:

$$\:L=\frac{1}{N}\sum\:_{i}\:{L}_{i}=\frac{1}{N}\sum\:_{i}\:-\sum\:_{c=1}^{M}\:{y}_{ic}\text{l}\text{o}\text{g}\left({p}_{ic}\right)$$
(7)

M represents the total number of categories; \(\:{y}_{ic}\) is an indicator variable (taking values 0 or 1). It is 1 if the predicted category of sample i matches the actual category, and it is 0 otherwise. \(\:{p}_{ic}\) denotes the probability that the model assigns sample i to category C.

Research design for english reading instruction

This study aims to explore the potential of an English reading instruction tool based on the Text CNN model to improve teaching efficiency and effectiveness. A comparative experiment was designed with the following arrangements: (1) Experimental Subjects: Students from an English learning class at a high school were selected and divided into a control group and an experimental group, with an equal number of students in each. The English proficiency levels of the students in both groups were comparable. (2) Experimental Content: The Text CNN model was employed to classify textbooks and supplementary reading materials, providing content appropriate for students’ reading levels. The model also extracted key terms from the reading materials, helping students quickly grasp the main themes and essential information of each article. Based on students’ learning feedback and comprehension performance, personalized reading materials and exercises were delivered to the experimental group. (3) Comparison of Teaching Plans: The control group followed the traditional approach of textbook reading combined with teacher guidance. The experimental group, in addition to the traditional method, used the Text CNN-based reading support tool to assist students in selecting materials, extracting key information, and receiving personalized practice. Figure 2 illustrates the experimental process flow diagram.

Fig. 2
Fig. 2
Full size image

Experimental process flow diagram.

Finally, a multidimensional quantitative evaluation of students’ learning outcomes is conducted. By comparing the pre-test and post-test, the improvement in reading comprehension abilities between the control group and the experimental group is assessed.

Experimental design and performance evaluation

Datasets collection, experimental environment and parameters setting

The dataset used in this study consisted of 2,000 English reading texts, primarily sourced from publicly available educational corpora (available at: https://www.kaggle.com/datasets) and extended reading materials from the school. The texts covered six thematic categories: literature, science and technology, news, social topics, law, and education. Each text ranged from 500 to 1,500 words. All texts were independently annotated by two teachers, each with more than three years of high school English teaching experience. The annotations included text type, core themes, and reading difficulty levels. In cases of disagreement between the two teachers, a third senior teacher served as an arbitrator. To ensure annotation consistency, Cohen’s Kappa coefficient was calculated, yielding a value of 0.86, indicating a high level of agreement.

Reading difficulty levels were determined using a combination of automated and manual approaches. First, texts were automatically graded using the Flesch-Kincaid and Lexile readability formulas. These results were then reviewed and adjusted by teachers according to CEFR standards, producing five final levels: Beginner, Lower-Intermediate, Intermediate, Upper-Intermediate, and Advanced. This approach ensured that the grading process was both objectively supported and aligned with actual teaching practices.

The participants were 60 s-year high school students (average age 16.8 years, with an approximately equal gender distribution) from a key high school. Students’ English proficiency was stratified into low, medium, and high levels based on pretest scores from a standardized English reading comprehension test. Using stratified randomization, students were assigned to an experimental group and a control group, with 30 students in each, ensuring balance in pretest scores and gender distribution. Prior to the experiment, an equivalence test was conducted using independent-samples t-tests, which indicated no significant difference in pretest scores between the groups (p > 0.05), confirming their comparability.

The experiment was conducted using the Python programming language and deep learning frameworks, primarily TensorFlow and Keras, for model training. An NVIDIA GPU (GeForce GTX 1080 Ti) was employed to accelerate training, while a standard server with 32 GB of memory handled data processing. The parameters of the Text CNN model were adjusted based on prior research and the characteristics of the dataset. The model architecture consists of multiple convolutional layers for feature extraction, pooling layers to reduce dimensionality, and a fully connected layer for classification. Key hyperparameter settings are summarized in Table 1.

Table 1 Key hyperparameter settings.

To ensure robust performance evaluation, the dataset was split into training, validation, and test sets in a 70%/10%/20% ratio using stratified sampling by text category. Hyperparameter tuning (Grid or Bayesian search) was performed on the training set, with 5-fold cross-validation used to verify the stability of the final hyperparameters. The test set was reserved exclusively for final performance reporting. Evaluation metrics included accuracy, recall, and F1 score, with 95% confidence intervals calculated using bootstrap sampling (n = 1000).

Performance evaluation

Performance of the text CNN model on different types of reading materials

When evaluated on different types of reading materials, the Text CNN model achieved high accuracy, recall, and F1 scores, demonstrating strong adaptability in handling diverse text-processing tasks. Comparisons of accuracy, recall, and F1 scores across different text categories are shown in Fig. 3.

Fig. 3
Fig. 3
Full size image

Comparison of accuracy, recall, and F1 score (%) across different types of reading materials.

The Text CNN model demonstrates robustness and adaptability across various types of reading materials. The results show that the model achieves the highest performance on science & technology and education texts, with accuracy rates of 93.5% and 94.2%, respectively. This suggests that the Text CNN model excels when processing texts with clear structures and specialized vocabulary. Such materials often feature standardized language and fixed expressions, which allow the model to effectively capture underlying patterns. The model also performs well on news and literary texts, achieving accuracy rates of 92.1% and 91.8%, respectively. Although these types of texts contain more emotional expressions and complex semantic relationships, the model is still able to extract key features and make accurate classifications. In contrast, performance on social media texts is slightly lower, with an accuracy of 89.6%. This decrease can be attributed to the colloquial language and informal expressions typical of social media, which pose challenges for Text CNN when processing unstructured content and diverse forms of expression. Nevertheless, the recall rate and F1 score remain high, indicating that the model maintains good generalization and balanced performance even on these more variable texts. Performance on legal texts is comparatively lower, with an accuracy of 90.3% and a recall of 89.8%. Legal texts often include complex terminology and require precise language, demanding deeper contextual understanding and fine-grained feature extraction. As a result, the Text CNN model exhibits slightly reduced performance on these tasks.

Overall, the model demonstrates stable and consistent performance across all text types, highlighting its adaptability in diverse textual environments. In conclusion, Text CNN achieves high accuracy and strong stability, particularly excelling with texts that have clear structures and standardized terminology. While the complexity of social media and legal texts slightly affects performance, the model remains effective and supportive across a wide range of reading tasks.

Performance comparison of different models in text analysis tasks

This section compares the performance of six commonly used text analysis models. They are Text CNN, Support Vector Machine (SVM), Naive Bayes, LSTM, BERT, and Gated Recurrent Unit (GRU). The models’ performance in text classification tasks is evaluated based on accuracy, recall, and F1 score. Figure 4 shows the comparison of accuracy, recall, and F1 score across different models in text analysis tasks.

Fig. 4
Fig. 4
Full size image

Comparison of accuracy, recall, and F1 score across different models for text analysis tasks.

Figure 4 reveals that Text CNN outperforms other models, with an accuracy of 94.1%, recall of 93.8%, and F1 score of 94%. This indicates that Text CNN is highly effective in text classification tasks. BERT follows closely, with an accuracy of 92.3%, recall of 91.8%, and F1 score of 92%. The high performance of BERT highlights its advantage in detecting local patterns in texts, making it especially suitable for tasks like sentence classification. The LSTM model also performs quite well, with an accuracy of 90.5%, recall of 89.7%, and F1 score of 90.1%. LSTM has an advantage in capturing long-range dependencies in text, making it suitable for text classification tasks that require long sequence context. GRU, similar to LSTM but with a simpler architecture, has an accuracy of 89.8%, recall of 88.6%, and F1 score of 89.2%. Although slightly inferior to LSTM, GRU still performs well in text classification tasks and offers higher computational efficiency. The SVM model performs well in high-dimensional spaces, with an accuracy of 86.7%, recall of 85.3%, and F1 score of 86%. While its performance is lower than that of the DL models, SVM remains a reliable baseline model in text classification tasks. Finally, the Naive Bayes model performs the worst, with an accuracy of 81.4%, recall of 80.8%, and F1 score of 81.1%. Despite its computational simplicity and ease of interpretation, Naive Bayes’ assumption of feature independence limits its performance on complex text data. Overall, BERT and Text CNN perform best among all models, with Text CNN showing the most outstanding performance.

Student performance analysis

This section analyzes the English reading comprehension test scores of the experimental and control groups before and after the experiment. The main indicators for the score analysis include the average score, standard deviation, median, maximum, and minimum values. By comparing the test results before and after the experiment, this work evaluates the effectiveness of the Text CNN-based auxiliary learning tool in enhancing students’ English reading ability. Figure 5 shows the student performance analysis.

Fig. 5
Fig. 5
Full size image

The student performance analysis.

In the control group, the average score before the experiment is 65.4, with a standard deviation of 5.8, indicating some variability in the students’ performance. After the experiment, although the control group’s scores improve slightly, with the average score rising to 70.3 and the standard deviation increasing to 6.2, the change is not significant. This suggests that traditional teaching methods have limited effectiveness in improving students’ reading comprehension abilities. In contrast, the experimental group shows a more significant change after using the Text CNN-assisted tool. Before the experiment, the experimental group’s average score is 66.2, with a standard deviation of 6.1, and their performance distribution is similar to that of the control group. After the intervention, the experimental group’s average score increases to 79.6, with a standard deviation of 7.3, and the median score is 80, indicating a notable improvement. The maximum score is 90, and the minimum score is 68, showing a broader distribution, but the overall improvement is substantial. From the data analysis, it is evident that the experimental group experiences a significant improvement in scores after using the Text CNN-based auxiliary tool, especially in reading comprehension. This demonstrates that the Text CNN model’s advantages in personalized learning, keyword extraction, and content recommendation can effectively enhance students’ performance in English reading comprehension.

Discussion

The experimental results demonstrate the effectiveness of the Text CNN model in English reading instruction, particularly for science & technology and education texts, achieving accuracy rates of 93.5% and 94.2%, respectively. Nevertheless, several limitations remain. First, the dataset is relatively small, containing only 2,000 texts with a limited range of topics and genres, which may restrict the model’s generalization to broader educational contexts. Second, although techniques such as Dropout and L2 regularization were applied, deep learning models trained on limited data are still prone to overfitting, particularly on text types with high linguistic variability, such as social media and legal documents. Third, the current model does not systematically incorporate external knowledge or cross-modal information, which may constrain its performance in reading tasks requiring commonsense reasoning or deep contextual integration. Future research should evaluate model robustness on larger, more diverse corpora and explore strategies such as noise injection and adversarial training to enhance generalization.

Conclusion

Research contribution

This study contributes at theoretical, methodological, and practical levels: (1) Theoretical Contribution: The study successfully applies the Text CNN model to high school English reading instruction, demonstrating its effectiveness in text difficulty classification, semantic feature extraction, and personalized content recommendation. This enriches the theoretical framework of deep learning applications in education. (2) Methodological Contribution: An end-to-end instructional adaptation framework is proposed, integrating multi-scale convolution, attention-based pooling, and difficulty mapping mechanisms. This framework enhances model classification performance and interpretability, providing a reusable methodological reference for future research. (3) Practical Contribution: A teaching support tool with real-time feedback and adaptive recommendation capabilities was developed, significantly improving students’ reading comprehension scores and reducing teachers’ grading workload. Experiments indicate particularly strong performance on science & technology and education texts, with accuracy exceeding 93%.

The significance of this study lies primarily in advancing the deep integration of AI technologies with English language teaching and providing new perspectives and practical pathways for the sustainable development of intelligent education. By introducing a Text-CNN model, the research establishes an intelligent support framework for secondary school English reading instruction, enabling a dynamic closed-loop process in text analysis, content recommendation, and learning feedback. This demonstrates both the interpretability and practical feasibility of AI applications in educational settings. The framework not only improves the alignment of reading materials with learners’ needs and enhances the precision of learning resource delivery but also promotes a data-driven shift in instructional decision-making, making classroom teaching more scientific and personalized. The results indicate that the judicious application of AI models can effectively reduce teachers’ repetitive workload. This reduction frees up time for differentiated instruction and the development of higher-order thinking skills. Consequently, it facilitates a shift from a traditional “knowledge delivery” model to a more student-centered “learning guidance” model. Importantly, this study provides empirical evidence for education policymakers and intelligent learning system designers, offering valuable insights for the long-term development of AI-empowered basic education.

Future works and research limitations

The limitations of this study are as follows. First, the dataset used in the experiments was relatively small, with samples primarily drawn from a single school setting, which may limit the generalizability of the model’s conclusions. Second, the Text-CNN model focuses mainly on semantic features at the text level and does not fully account for underlying affective, pragmatic, or cross-contextual factors in reading comprehension. Finally, the experimental period was relatively short, preventing a long-term evaluation of the model’s impact on learning continuity and knowledge transfer.

To address the study’s limitations, future research could proceed in several directions: (1) Expanding Data Scale and Diversity: Introduce reading materials from multiple regions and disciplines to construct a more representative benchmark corpus, enabling systematic evaluation of model adaptability across educational contexts. (2) Exploring Multimodal Learning: Incorporate multimodal information, such as audio and images, to create “text–audio” aligned corpora. This can improve the model’s understanding of spoken and unstructured texts, supporting complex reading tasks in real classroom settings. (3) Graph-based Modeling for Recommendation Optimization: Inspired by models such as Fpa-GCN, relationships among students, learning materials, and knowledge elements can be explicitly modeled as heterogeneous graphs. Graph neural networks can then capture semantic dependencies in complex interactions, improving the accuracy and interpretability of personalized recommendations. (4) Developing Lightweight and Explainable AI Tools: Design tools for real-world educational scenarios that are lightweight and interpretable, providing teachers with transparent and controllable model decisions without adding deployment costs. This approach can further facilitate human–AI collaborative teaching.