Abstract
In the context of the digital transformation of ideological and political education (IPE) in the new era, this study explores the interdisciplinary integration of red music and intelligent recommendation technologies. An intelligent deep learning model is developed to recommend IPE resources enhanced with red music, addressing challenges such as the low precision of traditional IPE resource delivery and limited emotional engagement. The model employs multimodal feature extraction techniques to fuse the emotional content of red music—captured via short-time Fourier transform and time-frequency attention mechanisms—with lyrical semantics. Learner profiles are constructed using dynamic cognitive diagnosis combined with Transformer-based temporal sequence modeling. Based on these profiles, precise resource recommendations are generated through heterogeneous information networks and hierarchical reinforcement learning. Experimental results indicate that the proposed model significantly outperforms the comparative methods across several metrics. Recommendation accuracy improved by 23%–35%. Educational relevance increased by up to 29%, and emotional resonance grew by 27%. Pilot tests in college I&P courses demonstrate that the model effectively enhances student engagement. Overall, this study offers a technology- and education-driven paradigm for the digital inheritance of red culture. Future work could expand the model to incorporate additional multimodal data and explore cross-cultural applications, further promoting intelligent and personalized development in IPE.
Similar content being viewed by others
Introduction
In the context of contemporary education, ideological and political education (IPE) plays a central role in cultivating individuals with firm ideals and beliefs, sound value orientations, and a strong sense of social responsibility1. However, traditional IPE models often encounter practical challenges, including monotonous content presentation and limited learner engagement2. With the rapid advancement of digital technologies and the increasing diversity of educational scenarios, innovating the delivery methods of IPE has become essential. Additionally, improving the precision and effectiveness of educational resource provision has emerged as a critical challenge that requires urgent attention in the education sector3.
As an important carrier of China’s revolutionary culture and advanced socialist culture, red music contains rich elements of IPE4. Red music emerged during the historical periods of the New Democratic Revolution, socialist construction, and reform and opening. It not only chronicles the Chinese Communist Party’s leadership and the people’s struggles through artistic expression but also embodies core values such as patriotism, collectivism, and revolutionary heroism5. From the passionate resistance expressed in The Yellow River Cantata to the reform enthusiasm conveyed in On the Hopeful Field, red music creates a distinct spiritual map. Through the dual narrative of melody and lyrics, it provides vivid emotional resonance and serves as an effective medium for transmitting values in IPE6. The rise of intelligent deep learning technology has opened new paths for the innovative application of red music in IPE resources7. Deep learning models, with their strong capabilities in feature extraction, pattern recognition, and data mining, can perform multi-level analyses of extensive red music resources. These analyses span melodic structures, lyrical semantics, and social-emotional contexts, enabling intelligent classification, personalized recommendation, and contextual adaptation of educational resources8. By establishing an intelligent chain connecting the “red music feature space,” learner cognitive profiles, and education goal matching, this approach aims to overcome the traditional “flood irrigation” method of resource supply. It seeks to build a new educational ecosystem centered on precise demand identification, dynamic content generation, and real-time feedback optimization. This shift promotes IPE from an experience-driven model to a data-driven framework and transforms one-way dissemination into interactive, two-way engagement9.
This study focuses on the interdisciplinary integration of red music, IPE, and deep learning. It addresses key questions such as how deep learning can uncover the educational connotations of red music, how to construct an intelligent recommendation model that accounts for learners’ personalized characteristics, and how effectively the model enhances the relevance and appeal of IPE. The findings not only enrich the theoretical understanding of digital technologies empowering IPE but also provide practical technical solutions for the preservation and innovation of red culture in the new era. Ultimately, this contributes to cultivating socialist builders and successors equipped with both cultural confidence and a strong sense of contemporary responsibility.
Literature review
Against the backdrop of educational digital transformation and the innovative development of IPE, the integration of red music into the field of IPE resource recommendation has emerged as a new hotspot in educational research and practice10. In recent years, many scholars have conducted in-depth discussions on educational resource recommendation systems: Urdaneta-Ponte et al. (2021) systematically reviewed the application status, challenges, and future trends of recommendation systems in educational scenarios through a systematic review, laying a theoretical foundation for follow-up research11. Machado et al. (2021) proposed an adaptive educational resource recommendation framework, emphasizing dynamic adjustment of recommendation strategies based on learners’ characteristics and learning contexts to improve resource adaptability12. Tavakoli et al. (2022) constructed an AI-based open recommendation system that integrated labor market demands with personalized education, expanding the application scope of educational resource recommendation13. In the context of adaptive learning support and personalized review material recommendation, Okubo et al. (2022) developed a system that leveraged learners’ historical data and learning behavior patterns to deliver resources with high precision. This approach effectively enhanced learning outcomes by tailoring materials to individual needs and learning trajectories14.
Focusing on personalized recommendation of educational resources, Raj and Renumol (2022) conducted a systematic literature review covering 2015–2020. They analyzed the development context, technical architectures, and application effectiveness of adaptive content recommenders in personalized learning environments15. Fu et al. (2022) developed a personalized educational resource recommendation system leveraging big data. Their approach applied data mining and analysis techniques to filter content aligned with learners’ interests and abilities from massive datasets16. Zhu (2023) employed an adaptive genetic algorithm to enable personalized recommendations, improving both the search efficiency and accuracy of the recommendation model17.
In the specific context of I&P course resources, Xu and Chen (2023) proposed a targeted recommendation system that integrated IPE objectives, students’ cognitive levels, and red cultural backgrounds, providing customized resource services for I&P teaching18. Beyond education, Bhaskaran and Marappan (2023) optimized recommendation systems for public machine learning datasets by refining modeling and analysis methods, which enhanced both the accuracy and reliability of recommendation outcomes19. Gm et al. (2024) provided a comprehensive review of the applications of digital recommendation systems in personalized learning, covering system architecture, technical support, and practical effect evaluation. This offered multidimensional references for constructing a recommendation model for red music-based IPE resources20.
Overall, existing research has made significant progress in the theoretical foundations, technical implementations, and practical applications of educational resource recommendation systems. However, a gap remains in the personalized recommendation of IPE resources integrated with red music. Most studies have not fully examined the unique cultural connotations, emotional significance, and IPE elements embedded in red music, making it difficult to achieve precise alignment between recommended resources and deeper educational objectives. Additionally, the analysis and modeling of learners’ characteristics in red music IPE contexts remain incomplete. There is a lack of a personalized index system that adequately reflects students’ cognition, emotional resonance, and value identification within red culture. Therefore, developing an intelligent deep learning model for recommending IPE resources based on red music requires addressing key technical challenges, including effective feature extraction from red music, construction of detailed learner cognitive profiles, and optimization of the recommendation algorithm, all while building upon existing research insights. This aims to fill the study gaps in this field and provide strong support for the innovative development of IPE in the new era. The comparative results of this study against existing educational resource recommendation methods are presented in Table 1.
Research methodology
This study developed an IPE recommendation model integrated with red music. It employs a comprehensive research framework that combines multimodal processing, graph neural networks, and reinforcement learning techniques.
In the aspect of multi-modal feature extraction of red music resources, cross-modal alignment and hierarchical feature fusion strategies are adopted21,22. For audio mode, based on Mel spectrum analysis, temporal-frequency attention mechanism is introduced23. Firstly, the time-frequency diagram \(\:S(t,f)\) is obtained by short-time Fourier transform (STFT), and then the attention weight \(\:{A}_{T}\) in time dimension and \(\:{A}_{F}\) in frequency dimension are calculated by using the dual-channel attention module respectively:
Finally, weighted fusion generates enhanced audio feature \(\:{\mathbf{X}}_{audio}\)24. For the text mode, based on the dynamic word vector representation of Bidirectional Encoder Representations from Transformers (BERT) model, combined with knowledge map embedding technology, the knowledge elements related to red music, such as historical events, people and spiritual connotations, are integrated into the semantic representation, and the high-level semantic aggregation is carried out through GCNs, so that the text feature \(\:{\mathbf{X}}_{text}\) rich in IPE elements is obtained25. Figure 1 shows the structure of GCNs.
In Fig. 1, a cross-modal interaction module is implemented using a gating cycle unit during multimodal fusion. The gating mechanism adaptively adjusts the fusion weights of audio and text information, enabling dynamic integration of the two modalities.
\(\:\mathbf{z}\) is the update gate and \(\:\mathbf{r}\) is the reset gate, and the fusion feature \(\:{\mathbf{X}}_{music}\) is output through iterative calculation26. Learner portrait construction adopts the method of combining dynamic cognitive diagnosis with personalized preference modeling27. Based on the Deterministic Input, Noisy “And” Gate (DINA) model, this study assesses students’ mastery of ideological and political (I&P) knowledge. It quantifies learners’ cognitive states across different I&P knowledge points—such as party history, revolutionary spirit, and socialist core values—using multidimensional item response theory.
\(\:{\mathbf{Y}}_{ij}\) indicates student \(\:i\)‘s answer to question \(\:j\). \(\:{\varvec{\theta\:}}_{i}\) is the student’s ability vector, and \(\:{\mathbf{a}}_{j}\) and \(\:{\mathbf{b}}_{j}\) are the question discrimination and difficulty parameters respectively19. Combined with students’ learning behavior sequence (clickstream data, duration of stay, interaction frequency), the time sequence preference prediction model is constructed by using Transformer encoder-decoder architecture, and the long-term dependence of learning behavior is captured through self-attention mechanism, and the situational awareness module is introduced28. The external factors such as learning time, equipment type and network environment are coded as situation vector \(\:\mathbf{C}\), and finally the cognitive diagnosis results and preference characteristics are integrated to generate the portrait of dynamic learner \(\:{\mathbf{X}}_{user}\)29. The design of multi-modal architecture is shown in Fig. 2.
In Fig. 2, the multimodal architecture of this study utilizes multiple technologies as support to construct an integrated architecture capable of collecting various types of information. In addition, the recommendation model employs the heterogeneous information networks (HINs) to construct a complex graph comprising multiple types of nodes and relationships. These include red music resources, learners, I&P knowledge points, and educational objectives, enabling the system to capture rich interactions and dependencies across diverse entities30. A meta path-guided heterogeneous graph attention network (HGAT) is adopted for node representation learning. For a given meta-path P, the attention coefficient between node \(\:u\) and \(\:v\) is calculated as:
\(\:{\mathbf{h}}_{u}\) and \(\:{\mathbf{h}}_{v}\) are node embedding vectors. \(\:{\mathbf{a}}_{P}\) is a metapath-specific attention parameter. The information of different metapaths is aggregated by multi-head attention mechanism to generate a network embedding representation of resources and users31. In recommendation decision-making, the recommendation score function is optimized by combining Bayesian personalized ranking loss function with the logical constraints of curriculum knowledge map:
\(\:\sigma\:\) is Sigmoid function. \(\:\hat {{y}}_{uij}\) is user \(\:u\)‘s preference prediction score for resource \(\:i\) compared with resource \(\:j\). \(\:{\Theta\:}\) is model parameter, and it is iteratively optimized by random gradient descent32.
During the model optimization stage, a hierarchical reinforcement learning framework is applied. The upper-level strategy network develops recommendation strategies based on macro-educational goals, such as value shaping and knowledge mastery. The lower-level executive network then fine-tunes specific recommendation content according to users’ real-time feedback, including learning completion and emotional response data33. The strategy network \(\:{\mu\:}_{\theta\:}\left(s\right)\) and the value network \(\:{Q}_{\omega\:}(s,a)\) are optimized by using the double delay depth deterministic strategy gradient algorithm. By minimizing the mean square error loss function:
\(\:\gamma\:\) is the discount factor. \(\:{\theta\:}^{{\prime\:}}\) and \(\:{\omega\:}^{{\prime\:}}\) are the target network parameters, and the training stability is improved by soft updating mechanisms \(\:{\theta\:}^{{\prime\:}}\leftarrow\:\tau\:\theta\:+(1-\tau\:){\theta\:}^{{\prime\:}}\) and \(\:{\omega\:}^{{\prime\:}}\leftarrow\:\tau\:\omega\:+(1-\tau\:){\omega\:}^{{\prime\:}}\) (\(\:0<\tau\:\ll\:1\))34. Simultaneously, the recommended results are validated against the logical rules of the curriculum knowledge map. Semantic constraints ensure that the recommended content aligns with both the knowledge system and value orientation of IPE35,36.
By integrating these multiple technologies, the proposed Graph Convolutional Networks–Transformer–Heterogeneous Information Networks (GCNs-Transformer-HINs) model achieves deep semantic mining and personalized, precise recommendation of red music IPE resources. The model continuously refines its recommendation strategies through a dynamic feedback mechanism, creating a closed-loop ecosystem of “data-driven analysis → intelligent decision-making → effect evaluation → strategy iteration.” This approach provides robust theoretical and technical support for the efficient utilization of IPE resources. The overall workflow of the proposed model in this study is illustrated in Fig. 3.
As shown in Fig. 3, the workflow systematically integrates several key modules: multimodal feature extraction of red music, multimodal fusion, dynamic learner profile construction, HINs representation learning, and recommendation strategy optimization. By jointly modeling audio and textual features and combining them with learners’ cognitive and behavioral data, the system builds personalized dynamic profiles. A heterogeneous graph attention network is then used to explore the complex relationships between resources and users. Recommendation decisions are dynamically optimized through hierarchical reinforcement learning, while semantic constraints from the knowledge graph and user feedback ensure both accuracy and educational value. The result is an efficient, intelligent, and personalized recommendation loop aligned with IPE objectives.
Table 2 presents the challenges faced by traditional IPE resource recommendation and the corresponding solutions offered by the proposed model.
This study extends beyond the development of technical models to actively advance both red culture education and IPE theory. It employs multimodal deep learning methods that integrate historical context, emotional expression, and textual semantics of red music. This approach overcomes the limitations of traditional single-modality analyses and enables more precise capture of the cultural and ideological meanings embedded in red music, facilitating a deeper understanding of how it conveys ideological messages. By combining dynamic cognitive diagnosis with personalized learner profiles, the model reflects learners’ cognitive states and emotional responses in real time. This shifts IPE from static knowledge delivery to interactive, adaptive learning and illustrates a practical application of the “internalization–externalization” theory from educational psychology. Additionally, reinforcement learning is used to optimize recommendation strategies, creating a closed-loop feedback system that aligns IPE content with individual cognitive development. This enriches research on the dynamic mechanisms of learning motivation and behavioral regulation in ideological transformation. The proposed framework improves the efficiency and accuracy of educational resource recommendations while providing new perspectives and methodological support for the advancement of red culture education and IPE theory.
Experimental design and performance evaluation
Datasets collection and experimental environment
This study utilized the China Red Music Digital Resource Database (CRMRD) as the primary experimental dataset. Supported by the Ministry of Culture and Tourism of China, this publicly accessible database (http://www.crmrd.cn) contains over 3,000 red music works spanning from 1921 to the present. It covers historical periods including the revolutionary war, socialist construction, and reform and opening. The dataset includes rich multimodal annotations: audio files (MP3 format, 44.1 kHz sampling rate), lyrics text, creative backgrounds (e.g., historical events, composers’ biographies), IPE labels (such as patriotism, collectivism, and revolutionary spirit), and user interaction data (learning duration, favorites/likes). Lyrics texts are annotated using a combination of manual labeling and BERT-based named entity recognition to extract structured information on events, characters, and emotional vocabulary. Audio data is analyzed and annotated with musicological features, including melodic modes and rhythmic patterns, by a professional music analysis team. The dataset is divided into a training set (2,000 pieces), a validation set (500 pieces), and a test set (500 pieces), supporting multimodal feature extraction, learner profile construction, and recommendation model training. Its authority and richness provide a reliable foundation for model evaluation and verification.
The experimental environment was established on a high-performance computing platform with the following hardware: Intel Xeon Gold 6240 CPU (2.6 GHz, 24 cores), NVIDIA Tesla V100 GPU (32 GB VRAM), 128 GB RAM, and 2 TB SSD storage, enabling large-scale parallel processing and deep learning training. The software environment was based on Python (v3.8, https://www.python.org/downloads/release/python-380/) and PyTorch (v2.0, https://pytorch.org/get-started/pytorch-2-x/), with Compute Unified Device Architecture (CUDA v11.8, https://developer.nvidia.com/cuda-11-8-0-download-archive) and cuDNN (v8.6, https://developer.nvidia.com/rdp/cudnn-archive) for GPU acceleration. Multimodal data processing relied on Librosa (v0.9.2, https://github.com/librosa/librosa/releases) for audio analysis, Hugging Face Transformers (v4.2.2, https://huggingface.co/transformers/v4.2.2/installation.html) for natural language processing, and NetworkX (v2.8.4, https://networkx.org/documentation/stable/release/release_2.8.4.html) for graph neural networks. Recommendation model training employed a distributed strategy, with parameter synchronization and optimization managed via PyTorch Lightning (v1.6.5, https://lightning.ai/docs/pytorch/1.6.5/). The entire experimental workflow was tracked and managed using MLflow (v2.3.0, https://newreleases.io/project/github/mlflow/mlflow/release/v2.3.0) for data version control, training logs, and hyperparameter tuning. This environment ensures both computational efficiency and algorithmic scalability, meeting the complex requirements of multimodal feature extraction, heterogeneous network modeling, and reinforcement learning-based optimization while maintaining stable and efficient model training (https://zenodo.org/doi/https://doi.org/10.5281/zenodo.10421362).
Parameters setting
When constructing an intelligent deep learning model for recommending IPE resources integrated with red music, the rational setting of hyperparameters is crucial to the model’s performance. Different hyperparameters directly affect the model’s learning efficiency, generalization ability, and recommendation accuracy. To achieve optimal model performance, this study carefully adjusts and optimized key hyperparameters. In Table 3, the specific hyperparameter settings are as follows.
Performance evaluation
Model effectiveness evaluation
To thoroughly assess the effectiveness of the intelligent deep learning model for recommending IPE resources integrated with red music, this study developed an evaluation framework with three key dimensions: recommendation accuracy, generalization ability, and educational adaptability. Recommendation accuracy was measured using standard metrics, including Accuracy, Precision, Recall, and F1-score. Using these metrics, the proposed model was compared against several baseline approaches: graph convolutional networks (GCNs), Transformer-based models, HINs, Collaborative Filtering (CF), BERT-based recommendation models, and Term Frequency–Inverse Document Frequency (TF-IDF) models. As shown in Fig. 4, the proposed model consistently outperformed all baseline models across multiple recommendation performance metrics. Each point in the figure indicates the improvement achieved by the proposed model relative to the corresponding baseline, demonstrating its superior capacity to provide accurate, generalizable, and pedagogically relevant recommendations.
(a-d show the evaluation of improvement effects on Accuracy, Recall, Precision, and F1-score, respectively)
As shown in Fig. 4, the proposed model consistently demonstrated stable superiority over baseline models across different training epochs in terms of accuracy improvement. At 200 epochs, the model outperformed GCNs, Transformer, and HIN by 30.7%, 26.2%, and 31.0%, respectively. Improvements over CF, BERTs, and TF-IDF also exceeded 25%. As training progressed to 1,000 epochs, the improvement rates fluctuated slightly but generally increased, with the most notable gains observed for GCNs (34.9%), BERTs (34.2%), and HINs (33.1%). This indicates that the model can fully leverage multimodal feature fusion and dynamic optimization strategies to maintain a high accuracy advantage over long-term training.
For Recall, the model performed particularly well at 600 epochs, achieving improvements of 34.1% over HINs, 33.8% over GCNs, and 31.2% over BERTs, significantly surpassing other epochs. This suggests that in the mid-training stage, the model effectively captures learners’ latent preferences, enhancing resource coverage. Even at the early stage of training (200 epochs), the model achieved substantial improvements in Recall. Specifically, it outperformed HINs by 30.2% and GCN by 29.5%. Improvements for CF and BERT were approximately 27%–28%. These results indicate that the model can significantly enhance coverage performance right from the beginning.
In terms of Precision, the model achieved notable improvements across all baseline models. The most pronounced gains were observed at 400 and 1,000 epochs. Precision increased by 33.9% and 34.1% over BERT, 34.8% and 35.0% over HINs, and more than 31% over GCNs. Even at 200 epochs, the model achieved notable improvements, with Precision rising 28.6% for HINs, 26.1% for CF, and 27.8% for BERT. These results demonstrate that the model can effectively filter out irrelevant or low-relevance recommendations even before it fully conv.
The F1-score, reflecting the balance between Accuracy and Recall, remained stable across epochs. At 400 epochs, improvements were 33.6% for Transformer and 31.2% for BERT, both at high levels. At 200 epochs, the F1-score improvement for Transformer reached 33.5%, indicating that the model can maintain a balance between recommendation accuracy and coverage even at an early stage. By 1,000 epochs, the F1-score improvements were 31.2% for HIN, 29.7% for CF, and 30.8% for BERT, demonstrating sustained performance balance after long-term training without bias toward a single metric.
Overall, the proposed model achieved consistent improvements of over 23% across core recommendation accuracy metrics, with some indicators reaching as high as 35%. These results confirm that integrating red music features and optimizing the deep learning architecture significantly enhances the efficiency and precision of IPE resource recommendations. They also highlight the advantages of multimodal data fusion and intelligent algorithms in improving educational resource matching.
Model generalization ability evaluation
Generalization ability is evaluated through cross-validation error and test set loss. Five-fold cross-validation is used to reduce data bias, ensuring model stability under different data distributions. Aiming at the characteristics of IPE scenarios, Educational Match Score (EMS) and Emotional Resonance Score (ERS) are innovatively introduced. EMS quantifies the fit between recommended resources and IPE objectives through expert scoring and knowledge graph semantic matching. ERS is based on emotional analysis of user comments, using a BERT emotional classification model to calculate the proportion of positive emotional feedback triggered by resources. The equations are as follows:
\(\:{e}_{j}\) is the I&P element of recommended resources. \(\:{E}_{target}\) is the target education element, and \(\:M\) is the number of resource elements. \(\:S\) is the total number of user comments, and \(\:positive\left(s\right)\) represents the positive emotional tag of comments \(\:s\). In Fig. 5, the evaluation results of the model generalization ability are displayed.
(a and b are respectively the evaluation of the promotion effect of educational matching degree and emotional resonance degree)
In Fig. 5, in the special comparative evaluation for IPE scenarios, the proposed model demonstrates significant differentiated advantages. Compared with traditional recommendation schemes and similar models in the education field, the model has achieved substantial improvements in the core dimensions that meet the needs of IPE. The data indicate that the model consistently improves educational scenario adaptability and user emotional interaction by more than 18%. In certain dimensions, the optimization effect is even more pronounced, with the highest improvement rate reaching nearly 29%.
This improvement intuitively reflects the model’s ability to deeply mine the connotations of red music in IPE. By integrating the historical context, melodic emotions, and lyrical semantics of red music, the model more accurately captures the relationships between educational resources and IPE objectives. This enables dynamic alignment of educational content with learners’ training needs throughout the recommendation process. Meanwhile, the model’s fine-grained characterization of learners’ personalized features enables it to keenly identify different users’ emotional response patterns to red music. Through intelligent adjustments of recommendation strategies, it strengthens the positive feedback between resource delivery and emotional resonance.
Experimental results further demonstrate that the model’s advantages extend beyond overall performance improvements. They are also evident in the close integration of recommendation outcomes with IPE scenarios. For both classic red songs with revolutionary historical themes and contemporary main melody works, the model achieves efficient alignment between resource value and user needs. This is accomplished through differentiated feature extraction and tailored recommendation logic. Such improvements not only represent technical optimization but also illustrate the innovative integration of red cultural resources with intelligent algorithms in educational contexts, providing a measurable foundation for the digital transformation of IPE.
Training and testing accuracy and loss curves
To visually present the convergence speed and performance stability of the model during training, this study tracked accuracy and loss over 1,000 training epochs. The specific results are shown in Fig. 6.
As illustrated in Fig. 6, the proposed model’s training accuracy gradually increased from 0.923 to 0.95, while the testing accuracy rose from 0.907 to 0.927. This demonstrates that the model achieves strong performance during both training and testing and maintains high generalization capability. Simultaneously, the training loss decreased from 0.121 to 0.052, and the testing loss declined from 0.153 to 0.090, showing a steady reduction in errors and stable training without overfitting. Overall, these curves confirm that the model converges quickly, continuously improves during training, and exhibits excellent stability and reliability.
Table 4 presents representative test samples along with the corresponding model recommendation results.
Table 4 presents a selection of representative test samples along with their corresponding model recommendation results, demonstrating the model’s precision and relevance in personalized recommendation. For instance, Sample 001, based on the user’s preference for high-energy melodies and keywords such as “revolution” and “struggle,” was recommended The Yellow River Cantata and On the Hopeful Field. Sample 002, considering the user’s browsing behavior related to anti-Japanese war resources, received a recommendation for Railway Guerrilla, which has high emotional relevance. Moreover, the model can intelligently adjust recommendations according to users’ preferences for rhythm and emotional expression (Sample 003) or address learners’ weaker knowledge areas (Sample 004). This illustrates the model’s ability to efficiently align educational resources with user needs through multimodal feature integration and personalized learner profiling.
Model efficiency evaluation
To assess the model’s practical runtime efficiency, inference time and resource consumption were tested using a mainstream server equipped with an NVIDIA Tesla V100 GPU. The detailed results are presented in Table 5.
Table 5 illustrates that the proposed model achieves a well-balanced and practical performance in terms of efficiency. The inference time for a single recommendation is 28 ms, which is considerably faster than Transformer (40 ms) and BERTs (45 ms), slightly better than GCNs (35 ms) and HINs (32 ms), and only marginally slower than lightweight models such as CF (15 ms) and TF-IDF (10 ms). The model contains 15.4 M parameters, representing a moderate size and substantially reducing computational load compared with BERT’s 110 M parameters. GPU memory usage is 3.6 GB, meeting the requirements of mainstream servers and some edge computing devices, while outperforming Transformer (4.2 GB) and BERT (8.5 GB). CPU inference time is 180 ms, faster than GCNs (220 ms) and Transformer (260 ms), ensuring responsive performance. Training a single epoch takes 35 s, and five-fold cross-validation completes in 9 h, making it suitable for medium- to large-scale datasets. Overall, the model combines high recommendation performance with low computational demands and fast inference, demonstrating strong practical applicability.
Discussion
Driven by the dual goals of digitally transforming IPE and preserving red culture, the intelligent deep learning model for recommending IPE resources integrated with red music introduces innovative advances in educational resource delivery through a multi-technology integration approach. The model begins with multimodal feature extraction from red music. Audio emotional rhythms are analyzed using short-time Fourier transform and time-frequency attention mechanisms. For example, rhythmic patterns are mapped to the fighting emotions conveyed in The Yellow River Cantata. Lyrics are semantically processed through a combination of bidirectional encoder representations and GCNs. This enables the extraction of IPE-related keywords, such as “reform” and “struggle,” from works like On the Hopeful Field. These processes form a cross-modal feature space where artistic characteristics and educational elements are closely intertwined. Although this study focuses on Chinese red music, the cross-modal feature modeling and fusion approach can be applied to other emotion-driven educational domains. Examples include courses on the Anti-Japanese War, folk music education, or dramatic literature. The approach is feasible because emotion–semantic couplings commonly exist across modalities such as music, video, and text. Adapting the model to a new domain requires constructing relevant knowledge graphs and emotion-label systems. However, for subjects with limited emotional content or hard-to-quantify artistic features—such as higher mathematics or formal logic—the benefits of multimodal modeling may be reduced. In such cases, domain-specific structured feature modeling may be necessary. At the learner profiling level, the model combines cognitive diagnosis based on the deterministic input–noisy “AND” (DINA) model with Transformer-based temporal modeling. This quantifies students’ mastery of knowledge points, including party history and revolutionary spirit. It also captures the dynamics of the learning environment through a context-aware module, such as fragmented learning behaviors during mobile study. The result is a three-dimensional learner profile that integrates static capabilities, dynamic behaviors, and environmental variables. These profiles provide precise anchors for personalized recommendation.
The profiling mechanism is scalable to cross-cultural and multilingual educational environments, particularly for the international dissemination of red culture or IPE. Implementation strategies include the following. First, the existing Chinese knowledge graph can be extended or replaced with multilingual versions, and cross-language embedding models can be introduced to ensure semantic consistency. Second, localized emotion-label systems can be constructed for different cultural contexts. For example, in English-language settings, keywords such as “freedom” and “justice” can be added. Third, culturally specific learning behavior features can be incorporated into the context-aware module, such as collective learning preferences or ritual participation, to improve the cultural adaptability of recommendations. It is important to note that semantic shifts and variations in emotional expression across languages may reduce feature-matching accuracy. To mitigate this, alignment mechanisms and cross-cultural data augmentation strategies should be applied during model training.
During learner profiling, all behavioral data are anonymized, and participants provide informed consent to ensure privacy protection and ethical compliance. The recommendation model is built around HINs. It employs meta-path guided heterogeneous graph attention mechanisms to capture complex correlations among red music resources, learners, and I&P knowledge points—for example, the semantic path “work → historical event → educational goal.” Bayesian personalized ranking combined with hierarchical reinforcement learning enables dual closed-loop optimization, aligning macro-level educational goals with micro-level user feedback. Experimental results show that the model outperforms baseline approaches across multiple dimensions. Precision increases by 28.5%, recall by 25.3%, educational match degree by up to 29%, and emotional resonance by 27%. Among younger users, preference for new-era main melody works rises by 32%, demonstrating that the model effectively enhances IPE engagement and affinity through technological empowerment.
It is important to note that the reported 23%–35% performance improvement mainly stems from IPE tasks focused on red music. This gain depends on the strong emotional features and clear educational objectives present in these resources. For IPE subtopics with weaker emotional drivers or less explicit educational content—such as integrity education or legal knowledge—the model may not achieve similar improvements. Future studies could expand its applicability to other IPE domains using techniques like emotion-enhanced content generation or cross-modal contextual reconstruction.
The study also shows that the effectiveness of multimodal feature fusion arises from the synergy between musical art and IPE. Features such as melodic excitement (e.g., high-frequency band proportion) and the emotional polarity of lyrics (e.g., positive vocabulary like “struggle” and “dedication”) jointly enhance emotional engagement. At the same time, dynamic cognitive diagnosis of learner profiles (e.g., real-time identification of knowledge gaps) combined with reinforcement learning–optimized recommendation strategies (e.g., adjusting resource difficulty based on completion) creates an adaptive ecosystem of “need identification → content generation → effect feedback.” This approach provides theoretical support for cross-cultural or multilingual applications of red culture education. However, in low-resource language settings, sparse emotion labeling and cultural differences in educational semantics may reduce recommendation effectiveness. Potential solutions include semi-supervised cross-lingual transfer learning, constructing cross-cultural emotion lexicons, and incorporating localized expert annotation. The model also demonstrates remarkable robustness in recommending resources across historical periods. On the test set, loss is 23.6% lower than that of traditional models, indicating its ability to capture the temporal continuity of red music, such as the evolving semantic representation of “patriotism” across different works. This provides a strong technical foundation for the innovative inheritance of red culture. However, the current experiments mainly involve high school and university students and rely on large, well-annotated red music and educational metadata. In contexts such as primary education or adult learning, variations in cognitive level, interest, and emotional receptivity may influence recommendation performance. Likewise, small-scale or sparse datasets may reduce model stability and generalization. Future work should test the model across more diverse demographics, educational stages, and dataset sizes to delineate its applicability and improve generalizability.
In summary, this study not only validates the effectiveness and novelty of integrating red music into IPE resource recommendation but also offers a practical pathway for extending multimodal, emotion-driven educational recommendation systems across domains and cultures. Nevertheless, optimal performance remains highly dependent on quantifiable emotional features, comprehensive knowledge graphs, and detailed learner profiles. In settings with weak emotional cues, substantial cultural differences, or limited data, the model may not perform as effectively as it does in the red music–themed IPE context.
Conclusion
Research contribution
The intelligent deep learning model developed in this study achieves a deep integration of red music with IPE resource recommendation. Theoretically, it introduces evaluation metrics for educational matching and emotional resonance, and establishes an educational logic framework linking “artistic features → cognitive profile → educational objectives,” overcoming the technical limitations of traditional recommendation models. Methodologically, it addresses the challenges of extracting I&P elements from red music and aligning them with learners’ dynamic needs through multimodal feature extraction, heterogeneous network modeling, and hierarchical reinforcement learning. In practice, the model demonstrates substantial improvements in recommendation accuracy and educational adaptability—up to 29%—and enhances student engagement in I&P teaching pilots. Overall, this approach offers a reusable technical paradigm for the digital preservation of red culture and the innovation of IPE, with strong potential for cross-disciplinary applications.
Future works and research limitations
Although this study has achieved technological breakthroughs in recommending red music IPE resources, there remains room for improvement. Currently, the model has limited capability in mining performance aspects of red music, such as visual elements in choruses and symphonies, as well as users’ physiological feedback, including EEG signals and eye-tracking data. Future work could incorporate video visual feature analysis and multimodal affective computing to enable a more quantitative evaluation of users’ emotional resonance. In addition, the model’s real-time recommendation efficiency under large-scale concurrent usage needs enhancement. Algorithmic complexity can be reduced through techniques such as model compression and distributed training. Another limitation lies in the underdeveloped tracking of learners’ long-term value identification. Follow-up research could establish a dynamic, evolving I&P literacy evaluation system through longitudinal data collection. Future studies also aim to broaden cross-cultural application scenarios and explore adaptive adjustments of red music education in international contexts. Simultaneously, the integration with educational practice can be strengthened, promoting the combined design of intelligent recommendation technology, I&P courses, and social practice. This approach will more comprehensively support the digital, personalized, and global development of IPE in the new era.
Data availability
The datasets used and/or analysed during the current study available from the corresponding author Lifang Zhang on reasonable request via e-mail [zhanglifang@hncu.edu.cn](mailto: zhanglifang@hncu.edu.cn).
Code availability
The code is now accessible via the following permanent link: https://zenodo.org/doi/https://doi.org/10.5281/zenodo.10421362.
References
Liu, H. et al. Bell shape embodying zhongyong: the pitch histogram of traditional Chinese anhemitonic pentatonic folk songs. Appl. Sci. 12(16), 8343 (2022).
Liu, Y. Analysis of the implication and characteristics of red music In Jiangxi, China. Sciences of Conservation and Archaeology 36(3), 212–222 (2024).
Zhang, X. & Liu, H. Red or white? Color in Chinese folksongs. Digit. Scholarsh. Humanit. 36(1), 225–241 (2021).
Ma, X. An examination of the traits and effects of red songs on education, realistic inspiration and value, and culture during the yan’an Period. Edelweiss Appl. Sci. Technol. 8(2), 263–273 (2024).
Scharinger, M. et al. Melody in poems and songs: Fundamental statistical properties predict aesthetic evaluation. Psychology of Aesthetics, Creativity, and the Arts, 17(2): 163. (2023).
Putirulan, M. M., Cahya, R. A. & Latuihamallo, C. I. An Analysis of Lexical Cohesion Found in Red Song Lyrics. HUELE: Journal of Applied Linguistics, Literature and Culture, 3(1): 26–34. (2023).
Chen, J., Suvimolstien, C. & Khochprasert, J. Jiangxi folk songs: from cultural clues of ceramic music to cultural Management. J. Roi Kaensarn Academi 8(12), 602–618 (2023).
Clark, B. & Arthur, C. Is melody dead? A large-scale analysis of pop music melodies from 1960 through 2019. Empir. Musicology Rev. 17(2), 120–149 (2022).
Chang, L. Inheritance status and performance practice of red violin works under the background of the founding of the party for a Century. Significance 3, 4 (2021).
Bai, Z. & Wu, C. J. Analysis of the style and characteristics of Chinese piano performance in the 20th century from the perspective of Ethnicity. Herança 8(1), 190–205 (2025).
Urdaneta-Ponte, M. C., Mendez-Zorrilla, A. & Oleagordia-Ruiz, I. Recommendation systems for education: systematic review. Electronics 10(14), 1611 (2021).
Machado, G. M. et al. AwARE: a framework for adaptive recommendation of educational resources. Computing 103(4), 675–705 (2021).
Tavakoli, M. et al. An AI-based open recommender system for personalized labor market driven education. Adv. Eng. Inform. 52, 101508 (2022).
Okubo, F. et al. Adaptive learning support system based on automatic recommendation of personalized review materials. IEEE Trans. Learn. Technol. 16(1), 92–105 (2022).
Raj, N. S. & Renumol, V. G. A systematic literature review on adaptive content recommenders in personalized learning environments from 2015 to 2020. J. Computers Educ. 9(1), 113–148 (2022).
Fu, R., Tian, M. & Tang, Q. The design of personalized education resource recommendation system under big data. Comput. Intell. Neurosci. 2022(1), 1359730 (2022).
Zhu, Y. Personalized recommendation of educational resource information based on adaptive genetic algorithm. Int. J. Reliab. Qual. Saf. Eng. 30(02), 2250014 (2023).
Xu, Y. & Chen, T. The design of personalized learning resource recommendation system for ideological and political courses. Int. J. Reliab. Qual. Saf. Eng. 30(01), 2250020 (2023).
Bhaskaran, S. & Marappan, R. Enhanced personalized recommendation system for machine learning public datasets: generalized modeling, simulation, significant results and analysis. Int. J. Inform. Technol. 15(3), 1583–1595 (2023).
Gm, D. et al. A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access. 12, 34019–34041 (2024).
Hashim, S. et al. Trends on technologies and artificial intelligence in education for personalized learning: systematic literature. J. Acad. Res. Progressive Educ. Dev. 12(1), 884–903 (2022).
Xu, G. et al. Personalized course recommendation system fusing with knowledge graph and collaborative filtering. Comput. Intell. Neurosci. 2021(1), 9590502 (2021).
Bernacki, M. L., Greene, M. J. & Lobczowski, N. G. A systematic review of research on personalized learning: personalized by whom, to what, how, and for what purpose (s)?. Educational Psychol. Rev. 33(4), 1675–1715 (2021).
Gumbheer, C. P., Khedo, K. K. & Bungaleea, A. Personalized and adaptive context-aware mobile learning: review, challenges and future directions. Educ. Inform. Technol. 27(6), 7491–7517 (2022).
Bhutoria, A. Personalized education and artificial intelligence in the united States, China, and india: A systematic review using a human-in-the-loop model. Computers Education: Artif. Intell. 3, 100068 (2022).
Nguyen, A. et al. Ethical principles for artificial intelligence in education. Educ. Inform. Technol. 28(4), 4221–4241 (2023).
Meddeb, O., Maraoui, M. & Zrigui, M. Personalized smart learning recommendation system for Arabic users in smart campus. Int. J. Web-Based Learn. Teach. Technol. (IJWLTT) 16(6), 1–21 (2021).
Ayeni, O. O. et al. AI in education: A review of personalized learning and educational technology. GSC Adv. Res. Reviews 18(2), 261–271 (2024).
Tahir, S. et al. Smart learning objects retrieval for E-Learning with contextual recommendation based on collaborative filtering. Educ. Inform. Technol. 27(6), 8631–8668 (2022).
Wang, M. & Lv, Z. Construction of personalized learning and knowledge system of chemistry specialty via the internet of things and clustering algorithm. J. Supercomputing 78(8), 10997–11014 (2022).
Klašnja-Milićević, A. & Ivanović, M. E-learning personalization systems and sustainable education. Sustainability 13(12), 6713 (2021).
Amin, S. et al. Smart E-learning framework for personalized adaptive learning and sequential path recommendations using reinforcement learning. IEEe Access. 11, 89769–89790 (2023).
Shemshack, A. & Kinshuk, Spector, J. M. A comprehensive analysis of personalized learning components. J. Computers Educ. 8(4), 485–503 (2021).
Jian, M. J., K O. Personalized learning through AI. Adv. Eng. Innov. 5, 16–19 (2023).
Wan, H. & Yu, S. A recommendation system based on an adaptive learning cognitive map model and its effects. Interact. Learn. Environ. 31(3), 1821–1839 (2023).
Kohnke, L., Moorhouse, B. L. & Zou, D. Exploring generative artificial intelligence preparedness among university Language instructors: A case study. Computers Education: Artif. Intell. 5, 100156 (2023).
Funding
This work was supported by 2024 Hunan Provincial Undergraduate Colleges and Universities Teaching Reform Research Key Project "Research on the Integration Teaching Mode of Music Theory Courses of "Aesthetic + Discursive" under the Perspective of "Great Ideology and Politics", Grant No.:202401001244.
Author information
Authors and Affiliations
Contributions
Lifang Zhang: Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft preparation, writing—review and editing, visualization, supervision, project administration, funding acquisition.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics statement
The studies involving human participants were reviewed and approved by School of Music and Dance, Hunan City University Ethics Committee (Approval Number: 2022.9374342). The participants provided their written informed consent to participate in this study. All methods were performed in accordance with relevant guidelines and regulations.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, L. Intelligent deep learning model for recommending ideological and political music education resources. Sci Rep 15, 36402 (2025). https://doi.org/10.1038/s41598-025-20535-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-20535-3








