Abstract
This study proposes a CLT-grounded, three-tier interactive annotation model to mitigate cognitive overload and enhance knowledge acquisition in cultural-heritage serious games. Using the Shimao Ruins virtual platform, we evaluated behavior logs (interaction frequency, completion time), knowledge tests (immediate/delayed), and user feedback. The experimental group outperformed the control group in short-term recall (84.7% vs 64.6%) and long-term retention (72.3% vs 54.1%). Regression showed interaction frequency positively predicted learning (β = 0.87, p < 0.001), whereas task duration negatively correlated with performance (β = −0.29, p = 0.028). The model reduces extraneous load while fostering germane processing through progressive tasks and information granularity. Results support broader applicability and point to future directions in mobile AR/VR integration and adaptive difficulty for real-time load regulation.
Introduction
In the context of globalization, the preservation and dissemination of cultural heritage serve not only as a means of safeguarding historical memory but also as a critical pathway for fostering social identity and cross-cultural understanding1. Digital tools in cultural heritage education have significantly expanded the scope and depth of cultural dissemination. However, the intrinsic richness and complexity of cultural information often hinder efficient transmission, leading to insufficient public awareness and emotional connection with heritage sites2. In recent years, serious games have emerged as a powerful tool for cultural heritage education, combining educational and entertainment values. These digital platforms leverage interactivity and immersive experiences to effectively enhance users’ cultural engagement and knowledge acquisition3,4. Particularly, serious games incorporating virtual reality (VR) and augmented reality (AR) technologies have expanded the boundaries of cultural heritage dissemination, offering innovative multi-sensory approaches for knowledge transfer and experiential learning5.
Despite their potential, existing serious game designs often focus on interactivity and entertainment, with limited attention given to systematically managing learners’ cognitive load6. Studies have shown that the high complexity of cultural information in heritage education frequently results in cognitive overload, leading to learner fatigue and reduced efficiency7. When learners are required to process large volumes of historical information within a limited timeframe, the challenge of balancing enhanced learning experiences with effective cognitive load management becomes a critical issue in designing cultural heritage education tools. This study proposes a multi-level interactive annotation model grounded in Cognitive Load Theory (CLT) to address this challenge. The model optimizes the complexity of information and task design, dynamically allocating learners’ cognitive resources to reduce extraneous load while enhancing germane load. By fine-tuning the hierarchical structure of annotations, the model facilitates progressive and adaptive information presentation, significantly improving learners’ comprehension of cultural content and emotional engagement. Using the Shimao Ruins virtual education platform as a case study, this research validates the model’s effectiveness through quantitative and qualitative analyses. By integrating cognitive science principles into serious game design, the study aims to provide an innovative framework for cultural heritage education while advancing theoretical and practical applications in serious game development.
Serious games, a term introduced by Marc Prensky in 2001, refer to games that combine entertainment with educational, training, or other practical purposes. This concept has rapidly gained attention from both academia and industry, finding applications across diverse domains such as education, military training, healthcare, business management, and social services8. In the educational domain, serious games enhance learners’ engagement and motivation by integrating complex learning content with gamified mechanisms. Their immersive experiences and interactive features enable learners to construct knowledge more effectively and achieve deeper learning outcomes9,10. Additionally, studies indicate that well-designed tasks and narratives in serious games can promote long-term retention of learning content11.
In cultural heritage education, serious games have revitalized traditional approaches to cultural dissemination. Through virtual environments and interactive designs, they offer learners a vivid and intuitive way to engage with complex historical and cultural information12. For instance, some serious games have been successfully employed in museum exhibitions and heritage site reconstructions, facilitating deeper user comprehension of the historical context and value of cultural artifacts. However, most existing research prioritizes enhancing the interactivity, immersion, and entertainment value of serious games while neglecting the cognitive load challenges learners face when processing complex cultural information. This oversight often leads to cognitive overload, impairing knowledge acquisition and learning outcomes13.
The primary objective of cultural heritage education is to effectively disseminate knowledge, fostering learners’ understanding, retention, and appreciation of cultural heritage. With advancements in VR and AR technologies, cultural heritage education has increasingly embraced digital, interactive, and immersive approaches14. These technologies enable the dynamic and vivid presentation of historical sites and cultural resources through three-dimensional visualization and real-time interaction, significantly improving the accessibility and efficacy of cultural knowledge dissemination15. For instance, VR-based reconstructions of historical scenarios allow learners to explore heritage sites within virtual environments, enabling them to absorb complex information more efficiently within a shorter time frame16.
Despite these technological and educational breakthroughs, many cultural heritage games remain overly focused on technology-driven aspects, overlooking learners’ psychological and cognitive needs. The inherent complexity and multilayered nature of cultural heritage knowledge often exceed learners’ cognitive processing capacities, leading to cognitive overload and learning fatigue17. Designing cultural heritage games that balance the complexity of information presentation with cognitive manageability is a pressing challenge that must be addressed to enhance both user experience and learning outcomes.
CLT posits that human cognitive resources are limited and categorizes cognitive load into intrinsic load, extraneous load, and germane load. Effective management of these three types of cognitive load is critical to enhancing learning efficiency. Thus, optimizing task design and information presentation to minimize extraneous load while enhancing germane load remains a pivotal topic in educational research18,19,20. In recent years, CLT has been increasingly integrated into game design. Researchers have explored strategies such as adjusting game difficulty levels, decomposing tasks, and crafting narrative elements to optimize cognitive resource allocation for learners21. However, most studies have only applied CLT at a basic level, with limited exploration of how it can be tailored to meet the unique requirements of cultural heritage education. For example, the effective application of CLT in designing multi-level cultural knowledge annotations and progressive learning tasks to ensure both deep comprehension and sustained learner engagement remains an underexplored area22,23.
This study proposes a dynamic interaction model and cognitive load control strategy that progressively sequences tasks and adjusts the granularity of information annotations. This approach significantly reduces extraneous load, optimizes germane load allocation, and enhances the efficiency and retention of cultural knowledge acquisition. The proposed model features a hierarchical and adaptive structure that matches learners’ cognitive capacities with task complexities. This design effectively addresses cognitive overload while increasing learner engagement and cultural appreciation. By incorporating VR and AR technologies, the study transcends the limitations of traditional unidirectional cultural presentations. The model establishes an integrated framework for immersive education and interactive annotation, providing a direct and efficient solution for the dissemination of complex knowledge.
Using the Shimao Ruins virtual education platform as a case study, the study employs both quantitative and qualitative analyses to validate the model’s effectiveness in reducing cognitive load and enhancing cultural knowledge comprehension and retention. Looking forward, the proposed design framework demonstrates strong adaptability and potential for broader application, particularly in various cultural heritage contexts such as dynamic scene reconstruction, cross-cultural exchange, and the presentation of complex historical narratives. Further exploration of AR technologies could also enhance interaction flexibility and contextual adaptability. As VR and AI technologies continue to advance, serious game design can increasingly meet the diverse needs of learners, driving cultural heritage education from passive dissemination toward active engagement, ultimately fostering the global dissemination of cultural knowledge and enhancing cultural identity.
Methods
Cognitive load management and design strategy
This study adopts CLT as the theoretical foundation to inform the design of interactive annotation systems for cultural heritage. The complexity and multilayered nature of information in cultural heritage serious games often result in excessive cognitive load, negatively impacting immersion and knowledge acquisition. To address this issue, the study introduces a three-tier interactive annotation model grounded in CLT. The model balances the depth of information presentation with learners’ stages of knowledge acquisition to effectively manage cognitive load. By continuously monitoring user behavior data—such as click frequency, dwell time, and task completion rates—and integrating user feedback, the system dynamically adjusts information presentation to maintain cognitive load at an optimal level. This dynamic adjustment helps prevent cognitive overload and learning fatigue, ensuring a seamless and engaging learning experience.
Three-tier interactive annotation model design
The study proposes a three-tier interactive annotation model that integrates CLT and adaptive learning theories. The model employs a progressive structure of information presentation and task design to guide users through the gradual acquisition of cultural heritage knowledge. Key design principles include cognitive load management, incremental information depth, and increasing task complexity. Together, these principles ensure a scientifically rigorous and engaging learning process. Through dynamic adjustments and the integration of multi-tier tasks, users can achieve cultural knowledge acquisition at their own pace and according to their learning objectives, thereby mitigating the risk of information overload (see Fig. 1).
Cognitive load management is a core function of the three-tier interactive annotation model, which aims at regulating the amount of information and task complexity to match the cognitive processing capacity of users. This approach prevents information overload and enhances learning outcomes. Grounded in CLT, the model segments the learning process into three levels, dynamically adjusting cognitive load at each level to ensure effective knowledge acquisition. At the basic level, essential information about artifacts, such as name and purpose, is provided, prioritizing key content to reduce cognitive load and establish foundational knowledge. Interaction methods are kept simple, such as clicking or scanning, to minimize the volume of information and task complexity. The intermediate level introduces additional details, such as material composition, craftsmanship, and design elements, after users have grasped basic knowledge. This stage moderately increases the information volume and task complexity, using exploratory tasks to deepen understanding. The advanced level requires users to integrate information from the previous two levels through complex tasks such as reasoning, judgment, and puzzle-solving. This stage stimulates higher-order cognitive activities, enhancing understanding of artifacts and the integration of cultural knowledge. Through this progressive cognitive load management, the system dynamically adjusts user engagement, optimizes learning paths, and improves knowledge retention (see Fig. 2).
Incremental information depth is a key design strategy of the three-tier interactive annotation model. By progressively revealing the depth and breadth of artifact information, the system guides users from foundational knowledge to deeper comprehension, fostering a comprehensive understanding of cultural heritage. At the basic level, fundamental information about artifacts, such as name, function, and context, is offered to help users establish an initial understanding. Information is presented in a concise and intuitive manner to lower the learning threshold and create a conceptual framework. The intermediate level expands on details such as material properties, craftsmanship, and design elements through multidimensional presentations. This approach enhances users’ understanding of the physical and cultural contexts of the artifact while leveraging user interaction to prevent information overload. The advanced level focuses on complex historical and cultural significance, guiding users to explore the role and value of artifacts within specific historical contexts. Information is presented with an emphasis on holistic and systematic narratives, enabling users to build a deeper cognitive framework. This gradual information presentation enables users to systematically learn cultural knowledge, improving their understanding of artifacts and their cultural significance (see Fig. 3).
The complexity of interaction tasks is a crucial component of the three-tier interactive annotation model. The system progressively increases operational difficulty and task challenges to stimulate user engagement and enhance knowledge acquisition. At the basic level, simple tasks are featured, where users interact with artifacts through basic actions, such as clicking or scanning, to access essential information. This ensures a smooth interactive experience and allows users to focus on acquiring foundational knowledge. The intermediate level builds on foundational knowledge with more complex tasks, such as rotating, zooming, or clicking specific areas to unlock additional details. This level fosters deeper artifact exploration and enhances user engagement. The advanced level presents users with high-level tasks, such as reasoning, puzzle-solving, and contextual reconstruction, requiring integration of information from earlier levels. These tasks increase the learning challenge, strengthen motivation, and improve logical reasoning and knowledge synthesis. This multi-tiered task complexity design not only enhances the game’s appeal but also empowers users to engage in active learning, enabling them to acquire knowledge effectively through meaningful interactions (see Fig. 4).
By combining cognitive load management, incremental information depth, and task complexity, the three-tier interactive annotation model optimizes knowledge acquisition in cultural heritage serious games. Specifically, precise cognitive load regulation prevents information overload, dynamic information presentation ensures seamless and comprehensive learning, and tiered task complexity enhances user motivation and interaction quality. This study provides innovative strategies for serious game design, advancing the scientific and effective dissemination of cultural knowledge.
Application in practice
This study uses the Shimao Ruins in Shaanxi Province as a case study to explore the application of the three-tier interactive annotation model in cultural heritage education. The model aims to enhance learners’ absorption of cultural knowledge. Based on the proposed design framework, a serious game titled Shimao Cloudscape was developed. By virtually reconstructing key elements of the Shimao Ruins, the game creates “authentic” immersive scenarios, allowing learners to explore and engage with the site interactively. Through task-based interactions, learners progressively gain insights into the historical background, cultural significance, and the critical role of the Shimao Ruins in ancient Chinese society. The game design encourages cognitive and emotional engagement across different tiers, fostering a deeper understanding of the site’s cultural value and the importance of heritage preservation.
Rooted in the rich cultural and historical context of the Shimao Ruins, this study leverages the site as a case to enhance cultural dissemination through digital tools and serious game design. As a significant representative of the Longshan Culture and one of the foundational sites of early Chinese civilization, the Shimao Ruins hold immense archeological and cultural value. Since its discovery in 1976, it has served as a pivotal resource for understanding the origins and evolution of Chinese civilization. The site’s complex cultural coordinates and abundant relics, particularly those related to early kingdom-building, not only demonstrate the depth of Chinese civilization but also provide valuable evidence for studying the structure and development of ancient Chinese society.
Despite recent efforts to protect and promote the Shimao Ruins using digital means, existing educational methods remain largely reliant on traditional exhibits and explanations24. These approaches lack sufficient interactivity and immersive elements, failing to fully engage learners or stimulate cognitive participation. To address these gaps, this study integrates CLT and the three-tier interactive annotation model, proposing an innovative educational framework that uses virtual reconstruction and interactive experiences to foster a profound understanding of the cultural essence of the Shimao Ruins and improve learners’ cultural knowledge retention.
Design of Shimao Cloudscape annotations
To effectively convey the cultural significance of the Shimao Ruins and enhance learners’ comprehension, this study incorporates the three-tier interactive annotation model based on CLT. The design strategically progresses through three levels—basic, intermediate, and advanced—optimizing the presentation of artifact information and the structuring of interactive tasks. The basic level offers concise foundational information about artifacts, enabling learners to quickly establish initial cognitive frameworks. The intermediate level encourages learners to explore artifact details and contexts through interactive operations, deepening their cultural understanding. The advanced level challenges learners to synthesize previously acquired information through reasoning and contextual reconstruction tasks, promoting comprehensive cultural and historical insight. This tiered design enhances learning effectiveness while increasing engagement and enjoyment, providing an innovative pathway for the digital dissemination of cultural heritage (Fig. 5).
The basic-level annotation design focuses on presenting fundamental information to help learners quickly develop an initial understanding of artifacts. The goal is to provide essential details, such as artifact names, functions, and historical backgrounds, in a straightforward manner to prevent cognitive overload. Textual descriptions are paired with clear visuals, ensuring that users can grasp key content without being overwhelmed by excessive details.
For instance, when interacting with Artifact No. 11, a stone carving on the virtual platform, the system displays foundational information, including its name, excavation site, historical period, and intended use. These details are delivered succinctly, avoiding unnecessary complexity while ensuring that learners focus on core content. This approach minimizes cognitive load, allowing learners to acquire basic knowledge within a low-pressure environment (Fig. 6).
In terms of interaction, the basic level employs intuitive actions such as clicking or scanning to trigger information displays. This straightforward mechanism is especially suited to users new to cultural heritage content. By simplifying operational processes, this level enhances user experience, reduces cognitive effort, and increases efficiency in accessing information. Additionally, it establishes a cognitive foundation for learners, preparing them for deeper exploration in subsequent levels (Fig. 7).
Building on the basic level, the intermediate level introduces more diverse and engaging tasks to deepen learners’ understanding of artifacts. Moving beyond merely acquiring foundational knowledge, learners are encouraged to actively explore artifact details and historical contexts through specific interactions. Tasks may involve operations such as rotating, zooming, or clicking specific sections to unlock additional content. This design not only boosts engagement but also motivates learners to explore further.
For example, when rotating Artifact No. 11, learners may discover additional information on its craftsmanship, material selection, and role within its historical context. These tasks allow users to examine the artifact from multiple perspectives, gradually building a more comprehensive understanding of its cultural and historical significance. Tasks at this level are moderately challenging, maintaining a balanced cognitive load to prevent fatigue while offering meaningful engagement (Fig. 8).
The interaction mechanisms at this level are more flexible, incorporating operations such as rotation, zooming, and area-specific clicks. These are complemented by immediate feedback mechanisms; for example, completing an action triggers the display of new information, fostering active learning. By combining tasks and feedback, learners transition from passive information receivers to active knowledge explorers. This level ensures sustained interest and engagement, equipping learners for more advanced exploration in the final tier (Fig. 9).
The advanced level represents the core of the interactive annotation system, focusing on complex tasks and multi-layered interactions to enhance cultural understanding and knowledge integration. At this level, learners are expected to apply previously acquired knowledge while engaging in critical thinking, deep reasoning, and solving complex challenges. These tasks—such as contextual reconstruction, puzzles, and logic-based exploration—reveal deeper cultural insights embedded within artifacts.
For example, learners may be tasked with piecing together historical clues from multiple artifacts to reconstruct a specific historical event, illustrating the artifacts’ societal roles. This process involves not only understanding basic artifact information but also contextualizing its cultural and historical significance. Through such interactions, learners gain a nuanced appreciation of artifacts’ cultural value, social functions, and roles within the site.
Tasks at this level are designed with greater complexity and challenge, requiring learners to integrate knowledge from the earlier levels for critical analysis. These high-level interactions demand sustained cognitive engagement, enabling learners to enhance knowledge retention and achieve deeper cultural understanding. Such advanced tasks not only strengthen learning outcomes but also facilitate knowledge internalization and refinement (Fig. 10).
User behavior experiment
The study recruited 100 participants using stratified sampling to ensure demographic diversity and ecological validity. The cohort consisted of individuals aged 18–55 years, with a balanced representation across three key dimensions: cultural background (40% East Asian, 30% Western Asia, 30% other), educational attainment (40% bachelor’s degree holders, 35% master’s degree holders, 25% vocational education), and prior expertise in cultural heritage (25% professionals in history/design fields vs 75% general public). To ensure statistical robustness, a priori power analysis was conducted using G*Power 3.1. This analysis determined that a minimum of 64 participants would be required to achieve a statistical power of 0.8 (α = 0.05, effect size f = 0.25). The final sample size of 100 participants exceeded this threshold to accommodate potential attrition, ensuring the robustness of the findings.
While the sample size was calculated based on statistical power analysis, we acknowledge that the adequacy of the sample size may vary depending on the complexity and design of the study. As this study involved a multi-level interactive annotation model, which required careful consideration of the participants’ ability to engage with the system at different levels, a larger sample could potentially provide more detailed insights into the variability of the results. Future studies could consider expanding the sample size to further explore the generalizability of these findings across different populations and settings. Additionally, future research could also involve conducting similar experiments in various cultural contexts to validate the results and assess whether the sample size and statistical power are adequate for different heritage education environments.
The experiment involved two groups: the experimental group and the control group. To ensure the validity of the experimental results and control for any potential confounding variables related to equipment differences, both groups were assigned specific and consistent equipment configurations. The experimental group used Oculus Quest 2 VR headsets (72 Hz refresh rate, 128 GB storage) with hand-tracking enabled. This provided an immersive, interactive learning environment, allowing participants to engage with the three-tier interactive annotation system. The VR platform ensured fidelity in the interactive tasks, including gaze fixation for basic tasks, rotating and zooming for intermediate tasks, and multi-artifact puzzle-solving for advanced tasks.
In contrast, the control group interacted with the same content through a 2D desktop interface on Dell Precision 3560 laptops (1920 × 1080 resolution, 16 GB RAM). The desktop interface presented identical historical and cultural information in a linear slideshow format, with manual navigation buttons. Participants in the control group navigated through static slides that displayed the same historical and cultural content, but without the interactive and progressive features of the experimental group’s system. The content was displayed in a fixed order, with no dynamic adjustments to task complexity as participants progressed. The control group’s tasks were less cognitively demanding, as they did not require interaction with the content beyond reading the slides.
While both groups were exposed to the same content, the main difference between them was the interactivity and task complexity. The experimental group’s interaction model was designed to increase task complexity progressively, from basic information to more advanced cognitive tasks. In the VR-based system, users engaged with artifacts interactively, rotating objects, zooming in on details, and solving puzzles, which required active cognitive engagement. In contrast, the control group’s linear slideshow offered a static presentation, where users clicked through each artifact’s description without any interaction beyond manual navigation. The control group’s tasks were therefore less cognitively demanding, as they did not require manipulation or engagement with the artifacts beyond reading the content.
To ensure that any differences in learning outcomes were not influenced by equipment-related biases, several measures were implemented. First, the content presented to both groups was identical in terms of historical and cultural data, ensuring that the content itself did not vary. The experimental variable—the interactive annotation system—was consistently presented across both platforms, with the VR environment offering an immersive experience and the control group receiving equivalent content through manual navigation. This allowed for an unbiased comparison of the two systems, as the interaction format was the primary variable.
To further control for equipment-related variability, all participants received uniform instructions on how to interact with the system, ensuring consistency in user behavior. Both groups completed the same tasks under controlled conditions, minimizing any potential effects from hardware differences. By maintaining the consistency of content and task structure, we ensured that any observed differences in knowledge acquisition and cognitive load were due to the design of the annotation system, rather than the equipment used.
Results
Data collection
Behavioral data were automatically recorded to evaluate engagement and learning efficiency. The metrics included task completion time, interaction frequency, and annotation click rate.
Findings: The experimental group’s average task completion time was 7.2 ± 1.5 min, increasing to 9.1 ± 2.3 min for higher-difficulty tasks and decreasing to 5.8 ± 1.0 min for simpler ones. Interaction frequency in the experimental group averaged 32.5 ± 8.3 times, significantly higher than the control group’s 14.8 ± 4.2 times, indicating greater engagement. Annotation click rates in the experimental group reached 68% ± 12%, notably higher than the control group’s 35% ± 9%, demonstrating more frequent use of annotation features. These metrics formed the basis for subsequent group comparisons. Figure 11 illustrates the differences in task completion time and interaction frequency between the experimental and control groups.
Two types of knowledge tests were conducted: immediate and delayed. Immediate tests were administered after each gaming session to assess short-term memory and understanding. The experimental group scored 82.4% ± 5.6% on average, demonstrating superior short-term memory capabilities. Delayed tests were conducted 1 week post-experiment to evaluate long-term retention. The experimental group achieved a retention rate of 72.3% ± 7.8%, significantly outperforming the control group’s 54.1% ± 9.2%, indicating a clear advantage of the interactive annotation system (Fig. 12).
Upon completing the experiment, participants filled out feedback surveys to evaluate the annotation system’s effectiveness. Using a five-point Likert scale, participants rated the system on information richness, task difficulty, and learning effectiveness.
Findings: Information richness: 4.2 ± 0.5, task difficulty: 3.8 ± 0.7, learning effectiveness: 4.4 ± 0.6.
Participants acknowledged the system’s ability to provide substantial information and improve learning outcomes.
For subjective experiences, participants rated the system on game design, interactivity, and enjoyment: Game design: 4.5 ± 0.4, Interactivity: 4.3 ± 0.5, Enjoyment: 4.1 ± 0.6. These results highlight the system’s impact on enhancing engagement and enjoyment during gameplay. Figure 13 presents a summary of feedback scores.
To gain a deeper understanding of user experiences and identify potential areas for improvement, qualitative data were collected through interviews and focus group discussions. These sessions provided valuable insights into participants’ perceptions of the interactive annotation system, its effectiveness in knowledge acquisition, and the challenges they encountered during the study. The data were analyzed using thematic analysis, a widely recognized qualitative research method that allows for the identification and interpretation of patterns within the data. A coding scheme was developed based on the key themes emerging from the participants’ feedback. The process involved multiple stages: initial open coding, followed by axial coding to group related codes into broader themes, and finally, selective coding to identify the core themes that directly related to the research questions.
The thematic analysis revealed several key themes. One of the most prominent was information richness, as participants consistently reported that the interactive annotation system significantly enhanced their understanding of cultural heritage content. The system’s ability to present detailed, contextually relevant information in an engaging manner, along with the progressive nature of the three-tier annotation model, was particularly praised. In terms of task complexity and engagement, the majority of participants felt that the tasks were appropriately challenging, which fostered both cognitive engagement and enjoyment. However, a small subset of novice users expressed frustration, particularly during advanced tasks, which they found too complex. This feedback is valuable for refining the system’s difficulty scaling and providing more tailored learning experiences.
Regarding user interaction and system usability, most participants praised the interactivity and intuitive nature of the system, with the VR group especially appreciating the immersive qualities of the platform. However, some participants in the control group indicated that, while the 2D interface was functional, it did not offer the same level of immersion as the VR experience. In terms of suggestions for improvement, participants offered several actionable recommendations, such as increasing the use of visual cues, reducing repetitive task operations, and offering more frequent feedback during key tasks. These insights will help guide future iterations of the annotation system, ensuring it better meets user needs and enhances learning outcomes.
The results of the thematic analysis not only validate the effectiveness of the interactive annotation system in enhancing user engagement and learning, but also provide specific directions for improving the design. By incorporating these qualitative insights, future versions of the system can be optimized to address the challenges identified by users, ensuring that the system accommodates a wide range of learning preferences and cognitive abilities.
Experimental controls
All experiments were conducted in controlled environments to ensure data validity and consistency. Experimental conditions were standardized, minimizing environmental factors (e.g., lighting, noise) that could affect results. All participants used identical equipment (either computers or VR headsets) to eliminate hardware-related discrepancies.
Before the experiment, participants received uniform instructions detailing the experimental procedures to ensure clarity and consistency. The system automatically recorded behavioral metrics, such as click counts, task completion times, and interaction frequencies, ensuring data accuracy and reliability.
Data analysis
The data analysis utilized multiple statistical methods to ensure the scientific rigor and accuracy of the results. Behavioral data were analyzed using one-way analysis of variance to compare differences between the experimental and control groups in metrics such as interaction frequency. Knowledge test results were evaluated with paired t tests and variance analysis to measure differences in short- and long-term memory, providing insights into the system’s impact on knowledge acquisition. Participant feedback was analyzed using Likert scale ratings and thematic analysis to identify strengths and areas for improvement in the interactive annotation design. All experiments adhered to strict controls for environmental factors, device consistency, and data collection processes to ensure the reliability and validity of the results.
To assess the impact of the interactive annotation system on knowledge acquisition, independent t tests were conducted on post-test scores from the experimental and control groups. The results showed a t-value of 21.41 with a p value <0.05, indicating that the experimental group significantly outperformed the control group in knowledge acquisition. The experimental group achieved an average post-test score of 84.74 (SD = 5.07), reflecting high and consistent knowledge absorption, while the control group scored 64.62 (SD = 4.30), showing poorer performance and greater variability among participants. The comparison of standard deviations highlights the system’s effectiveness in reducing individual differences, demonstrating its stability and reliability (Fig. 14).
To explore the relationship between behavioral data and knowledge acquisition, multiple regression analysis was performed. The model revealed that annotation click rate had a significant positive correlation with knowledge acquisition (coefficient = 0.87, p < 0.001), indicating that higher interaction frequency with the annotation system improved knowledge absorption. Conversely, task completion time showed a negative correlation with knowledge acquisition (coefficient = −0.29, p = 0.028), suggesting that longer task durations could lead to cognitive overload, reducing learning effectiveness. The model’s high explanatory power (R2 = 0.627, adjusted R2 = 0.619) further supports the role of behavioral metrics in explaining differences in knowledge acquisition (Fig. 15).
Qualitative data were collected through interviews, focus group discussions, and feedback surveys. The data were analyzed using coding and thematic extraction methods to gain deeper insights into players’ subjective feedback and support system optimization. The majority of participants reported that the interactive annotation system enhanced the game’s engagement and efficiency in knowledge acquisition. Thematic analysis identified “information richness” and “operational simplicity” as the most frequently mentioned strengths. Suggestions for improvement included: adding more visual cues, reducing repetitive task operations, and providing more feedback in key tasks. These findings contribute to refining the system design for better user experiences.
The results demonstrate that the interactive annotation system significantly enhanced players’ knowledge acquisition. The experimental group achieved an average post-test score of 84.74, markedly higher than the control group’s 64.62 (p < 0.05), with strong statistical significance. Further multiple regression analysis revealed that: Annotation click rate positively correlated with knowledge acquisition: for every additional click, post-test scores increased by 0.87 points (p < 0.001). Task completion time negatively correlated with knowledge acquisition: for every additional minute, post-test scores decreased by 0.29 points (p = 0.028). These results suggest that frequent interactions with the annotation system effectively enhance learning outcomes, while prolonged task durations may induce cognitive overload, diminishing knowledge absorption. Overall, the interactive annotation system demonstrated its ability to improve player engagement, promote knowledge retention, and optimize learning efficiency. The correlation between click rates and task durations underscores the importance of balancing interaction frequency and task length for achieving optimal learning outcomes. This study highlights the significant advantages of using a cognitive load-based and adaptive interactive annotation system for improving knowledge acquisition in serious games for cultural heritage education. These findings provide a foundation for further development and refinement of serious game design.
Discussion
This study validates a three-tier interactive annotation model grounded in CLT for serious games in cultural-heritage education. Across the Shimao Ruins case, the experimental group outperformed the control group in short-term recall (84.7% vs 64.6%) and long-term retention (72.3% vs 54.1%), indicating that tiered information presentation can better align information complexity with learners’ processing capacity and thereby prevent overload. Multivariate regression further supports this interpretation: interaction frequency positively predicted knowledge acquisition (β = 0.87, p < 0.001), while longer task completion time—an indirect marker of inefficient processing—was negatively associated with knowledge acquisition (β = −0.29, p = 0.028). Together, these results substantiate the model’s mechanism of reducing extraneous load and fostering germane load in line with CLT.
The findings generalize beyond a single site. Use of a three-dimensional VR environment strengthened spatial cognition, yielding a 28% higher accuracy in recalling architectural relationships than a 2D interface. This is consistent with prior reports that immersive contexts can enhance encoding and memory of complex cultural content, while also increasing engagement. Nevertheless, technical and contextual constraints remain. In particular, AR deployment at physical heritage sites introduces environmental distractions that are difficult to regulate, posing a practical barrier to delivering stable cognitive load in situ.
User feedback highlights where adaptivity should be deepened. While most participants rated the three-tier model favorably, 12% of novices reported frustration during advanced tasks, suggesting that a fixed progression can overchallenge part of the cohort. Incorporating dynamic difficulty adjustment—guided by real-time load indicators such as eye-tracking or interaction telemetry—could operationalize CLT’s dynamic regulation principle, individualizing task complexity and pacing to sustain optimal load.
The pedagogical and design implications are twofold. First, the hierarchical task structure (basic → intermediate → advanced) appears to be a viable blueprint for balancing authenticity of cultural material with cognitive accessibility, enabling novices to establish foundational schemas before engaging with higher-order interpretations (e.g., stratigraphy, cross-period synthesis, or contested narratives). Second, pairing the annotation model with VR can improve motivation and time-on-task without sacrificing accuracy, offering a scalable pathway for dynamic scene reconstruction, cross-cultural exchange modules, and presentation of complex historical narratives.
Limitations temper these conclusions. Although powered adequately, the sample may not capture the variability associated with diverse cultural and educational backgrounds. External validity should therefore be tested with larger and more heterogeneous cohorts, including multilingual users. Moreover, our prototype emphasized VR; AR integration was not systematically examined. Future work should (i) evaluate hybrid AR/VR pipelines—especially on mobile—to improve flexibility in field settings, (ii) implement and test real-time adaptive mechanisms for difficulty and guidance, and (iii) assess longitudinal transfer (e.g., months-later application tasks) to gauge durable conceptual change beyond recognition and recall.
In sum, the three-tier interactive annotation model, implemented within an immersive serious game environment, demonstrably improves knowledge acquisition and long-term retention while adhering to CLT principles by minimizing extraneous load and maximizing germane processing. With targeted advances in adaptivity and AR integration, the framework is well positioned for broader, cross-cultural deployment in heritage education.
Data availability
Due to human-subject privacy and consent restrictions, raw data are not publicly available. De-identified derivatives (aggregate statistics/coding framework) and analysis scripts are available from the corresponding author upon reasonable request, subject to a data use agreement for non-commercial academic research.
References
Krtalić, M. & Alon, L. Personal cultural heritage management: a conceptual framework for constructing and curating cultural identities through personal collections. J. Doc. 80, 1238–1257 (2024).
Zhang, Y., Zheng, Q., Tang, C., Liu, H. & Cui, M. Spatial characteristics and restructuring model of the agro-cultural heritage site in the context of culture and tourism integration. Heliyon 10, e30227 (2024).
Maxim, R. I. & Arnedo-Moreno, J. Identifying key principles and commonalities in digital serious game design frameworks: scoping review. JMIR Serious Games 13, e54075 (2025).
Moghadam, S. G. et al. A mobile serious game about diabetes self-management: design and evaluation. Heliyon 10, e37755 (2024).
Damianova, N. & Berrezueta-Guzman, S. Serious games supported by virtual reality: literature review. IEEE Access 13, 38548–38561 (2025).
Gurbuz, S. C. & Celik, M. Serious games in future skills development: a systematic review of the design approaches. Comput. Appl. Eng. Educ. 30, 1591–1612 (2022).
Chen, S., Liu, X., Bakhir, N. M. & Yu, Y. A study of the effects of different animations on germane cognitive load during intangible cultural heritage instruction. Educ. Inf. Technol. 29, 19163–19196 (2024).
Robledo-Castro, C., Castillo-Ossa, L. F. & Corchado, J. M. Artificial cognitive systems applied in executive function stimulation and rehabilitation programs: a systematic review. Arabian J. Sci. Eng. 48, 2399–2427 (2023).
Daoudi, I. Learning analytics for enhancing the usability of serious games in formal education: a systematic literature review and research agenda. Educ. Inf. Technol. 27, 11237–11266 (2022).
Tang, Y., Liang, H. & Zhan, J. The application of metaverse in occupational health. Front. Public Health 12, 1396878 (2024).
Kayed, J. E. et al. Serious game for radiotherapy training. BMC Med. Educ. 24, 463 (2024).
Li, J. & Lv, C. Exploring user acceptance of online virtual reality exhibition technologies: a case study of Liangzhu Museum. PLoS ONE 19, e0308267 (2024).
Hung, L. et al. Best practices and practical strategies for co-designing virtual reality with Indigenous peoples: a scoping review protocol. PLoS ONE 20, e0325111 (2025).
Bekele, M. K., Pierdicca, R., Frontoni, E., Malinverni, E. S. & Gain, J. A survey of augmented, virtual, and mixed reality for cultural heritage. J. Comput. Cult. Herit. 11, 1–36 (2018).
Liao, X. 3D visualization design of digital intelligent landscape environment based on wireless network security. Sci. Rep. 15, 18387 (2025).
Rodriguez-Garcia, B., Guillen-Sanz, H., Checa, D. & Bustillo, A. A systematic review of virtual 3D reconstructions of cultural heritage in immersive virtual reality. Multimed. Tools Appl. 83, 89743–89793 (2024).
She, L., Wang, Z., Tao, X. & Lai, L. The impact of color cues on the learning performance in video lectures. Behav. Sci. 14, 560 (2024).
Tokuno, J. et al. Teaching chest tube insertion by blended learning: a multi-dimensional analysis. Surg. Innov. 31, 92–102 (2024).
Liu, D. The effects of segmentation on cognitive load, vocabulary learning and retention, and reading comprehension in a multimedia learning environment. BMC Psychol. 12, 4 (2024).
Xu, X., Dong, R., Li, Z., Jiang, Y. & Genovese, P. V. Research on visual experience evaluation of fortress heritage landscape by integrating SBE–SD method and eye movement analysis. Heritage Sci. 12, 281 (2024).
Cassani, R., Novak, G. S., Falk, T. H. & Oliveira, A. A. Virtual reality and non-invasive brain stimulation for rehabilitation applications: a systematic review. J. Neuroeng. Rehabil. 17, 147 (2020).
Li, Z. & Li, J. Learner engagement in the flipped foreign language classroom: definitions, debates, and directions of future research. Front. Psychol. 13, 810701 (2022).
Li, M. & Yu, Z. A systematic review on the metaverse-based blended English learning. Front. Psychol. 13, 1087508 (2023).
Fan, Z., Chen, C. & Huang, H. Immersive cultural heritage digital documentation and information service for historical figure metaverse: a case of Zhu Xi, Song Dynasty, China. Heritage Sci. 10, 148 (2022).
Acknowledgements
This work was supported in part by the China National Social Science Foundation (grant number 22BSH122) and the Ministry of Education of China under the Humanities and Social Sciences Foundation Grant No. 23YJA760004.
Author information
Authors and Affiliations
Contributions
Wei Zhou conceptualized the research, designed the study, and wrote the main manuscript text. Wei Zhou also conducted the data analysis and interpretation. Yanmin Xue supervised the research, provided critical revisions to the manuscript, and contributed to the study design. Shuang Wang contributed to the development and validation of the cognitive load model, as well as the experimental design. Minnan Cang and Kai Qi developed the virtual education platform (Shimao Cloudscape) and supported the experimental procedures. Zhi Qiao contributed to the data collection and provided valuable feedback on the manuscript. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhou, W., Xue, Y., Wang, S. et al. Cognitive load-based multi-level annotation model for knowledge acquisition in heritage games. npj Herit. Sci. 13, 559 (2025). https://doi.org/10.1038/s40494-025-02133-8
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-025-02133-8














