Introduction

Research background and motivations

In the information age, the integration of technology and education continues to deepen1,2,3. As a crucial component in promoting students’ comprehensive development, physical education (PE) must keep pace with the times. Traditional PE methods exhibit significant shortcomings, particularly in their lack of interdisciplinary integration and innovation. The emergence of the Science, Technology, Engineering, Arts, and Mathematics (STEAM) educational concept presents new opportunities for PE development. Meanwhile, the convolutional neural network (CNN), a core artificial intelligence technology, demonstrates substantial application potential in PE4,5,6,7.

STEAM education emphasizes the integration of science, technology, engineering, arts, and mathematics to cultivate students’ innovative thinking and practical abilities. Incorporating the STEAM concept into PE helps students deeply understand the scientific principles behind physical activities, enhances technical application skills, and develops aesthetic and logical thinking through practice. This interdisciplinary teaching approach enriches PE content while stimulating students’ learning interest and improving teaching effectiveness8,9,10,11.

However, current PE faces numerous challenges in achieving interdisciplinary integration12. On one hand, PE teachers’ limited interdisciplinary knowledge makes it difficult to incorporate science, technology, engineering, arts, and mathematics into PE teaching. On the other hand, traditional teaching methods and tools fail to meet the demands of interdisciplinary teaching. Therefore, exploring new teaching approaches and technological solutions becomes crucial for promoting interdisciplinary integration in PE13. CNN, as a powerful image processing technology, shows promising prospects in PE14,15,16. By training CNN models, high-precision recognition and analysis of sports movements can be achieved, providing scientific and accurate guidance for PE teaching. For instance, in basketball instruction, applying CNN to analyze shooting form enables precise identification of problems and targeted training. In soccer teaching, using CNN to analyze running trajectories and passing routes helps better understand game rhythm and tactical coordination17,18,19,20.

Research object

In the current information age, PE faces new opportunities and challenges21. Traditional PE often focuses on skill instruction and physical training while neglecting the integration of interdisciplinary knowledge and the cultivation of students’ innovative abilities22. The STEAM educational concept, emphasizing multidisciplinary integration, provides a new perspective for PE. Meanwhile, CNN’s outstanding performance in image recognition and processing brings new possibilities for movement analysis and skill identification in PE22,23,24.

This work aims to apply CNN to PE teaching design based on the STEAM concept, exploring an innovative PE teaching model. The model integrates knowledge from arts, science, technology, engineering, and mathematics, utilizing CNN technology to design diverse teaching projects and activities. This enables students to learn and master sports knowledge and skills through practice while developing innovative capabilities and comprehensive competencies. Specifically, the work first develops a CNN-based motion recognition system that achieves real-time capture and precise analysis of movements. The system employs CNN structures, including convolutional layers, pooling layers, and fully connected layers, to extract features and classify sports movement images or videos, providing accurate data support for teaching. Its working principles, model training process, and performance evaluation indicators are detailed in subsequent sections. This work then incorporates the STEAM educational concept to design interdisciplinary teaching projects, such as sports science experiments and sports technology innovation practices. In sports science experiments, students apply biomechanics and exercise physiology knowledge to study the effects of movement on physical functions. Sports technology innovation projects encourage students to combine computer science and engineering knowledge to develop intelligent sports training equipment or software. These projects follow educational psychology principles to stimulate students’ learning interest and initiative while cultivating interdisciplinary thinking and practical skills. Detailed project designs, implementation procedures, and expected outcomes are further elaborated in the research model section. Finally, empirical research and case studies evaluate the model’s effectiveness and feasibility. The empirical research adopts mixed quantitative and qualitative methods to collect data on students’ academic performance, skill levels, and innovative thinking abilities, employing statistical methods for analysis. Case studies select typical teaching examples to thoroughly examine instructional processes and outcomes, summarizing experiences and lessons. These investigations provide theoretically grounded and practically significant guidance for PE reform and development.

Literature review

In recent years, with the continuous evolution of the education domain and the rapid development of technology, the PE field has been actively exploring innovative teaching models and methods. During this process, the research on PE teaching design based on the STEAM concept has gradually become an important research direction in this field. The integration of cutting-edge technologies such as CNN has injected brand-new impetus into the innovation of PE.

Hsu pioneered the application of CNN and logistic regression models from deep learning (DL) to predict sports competition outcomes25. Hsu innovatively integrated gambling odds and actual match scores to construct sports-related datasets, leveraging CNN’s powerful image pattern recognition capabilities to extract data features. Meanwhile, the logistic regression model accounted for differences between strong and weak teams, ultimately achieving accurate match outcome predictions. This research held significant importance as it demonstrated CNN’s exceptional performance in sports data processing and provided crucial technical insights for subsequent PE research. For instance, when analyzing students’ sports skill training effectiveness, this method could be adapted to deeply mine training data to identify key factors influencing skill improvement, thereby supporting the development of personalized training programs. Seti et al. focused on Chinese sports text named entity recognition using a character-level graph convolutional network and self-attention mechanism models26. They treated each character in sports texts as a node and constructed relational edges between characters to extract internal entity structure information. They also employed self-attention models to capture hierarchical semantic information, significantly improving named entity recognition accuracy. Although the research concentrated on text processing, its synergistic use of CNN and self-attention mechanisms offered new inspiration for interdisciplinary PE research. In sports theory instruction, educators could reference this method to help students organize complex sports knowledge texts, enhancing comprehension and memorization efficiency of sports concepts, historical events, and other knowledge. Moreover, this method improved students’ information processing capabilities in interdisciplinary learning contexts. Wang et al. (2023) developed a sports health monitoring system integrating wearable devices, cloud computing, and DL27. The system employed CNN, long short-term memory networks (LSTM), and self-attention mechanisms to construct models that achieved high-precision predictions of athletes’ health conditions on large-scale datasets. Athletes can obtain cloud reports via mobile devices to promptly adjust training or competition plans. This research result is directly related to concerns about student athletic health in PE. During PE teaching, similar systems enable teachers to monitor students’ exercise physiological data in real-time, including heart rate and exercise intensity. Concurrently, comprehensively analyzing movement performance can identify potential sports risks, providing strong support for personalized PE teaching and exercise safety assurance. This ensures safe and scientific implementation of PE activities. Tenhovirta et al. explored cross-age peer tutoring in technology-enhanced STEAM projects, focusing on eighth graders mentoring seventh graders in programming skills28. The study found that cross-age tutoring remarkably enhanced practical production and STEAM education while imposing higher demands on tutors’ multiple competencies. Case studies further verified key tutors’ pivotal role and substantial contributions in peer learning networks. These findings provided practical paradigms for STEAM-based PE teaching design. PE teaching can incorporate cross-age or peer-assisted learning models, organizing students to collaboratively participate in sports project design and equipment innovation activities. By cultivating teamwork spirit, communication skills, and innovative thinking through peer interaction, PE teaching formats and learning experiences can be enriched. Lee (2021) investigated the impact of STEAM education implementation in middle school PE classes on students’ self-directed learning abilities and attitudes toward PE29. By setting up the experimental and control groups, the experimental group received STEAM-based PE teaching methods, while the control group adopted the traditional teaching mode. Results demonstrated significant improvements in learning attitudes and autonomous learning abilities among experimental group students. This research directly validated the positive outcomes of integrating STEAM educational concepts into PE teaching. Meanwhile, it showed that multidisciplinary PE teaching enriched teaching content and formats, stimulated student interest and initiative, and enhanced comprehensive competencies, providing direct practical evidence for STEAM-based PE teaching design. Perales and Aróstegui examined emerging issues in incorporating arts and humanities into core technology curricula30. They conducted an in-depth analysis of the rise of the STEAM educational concept, its classroom implementation, and influence in the social, economic, and educational fields. They proposed that while ensuring the economic rationality of education, a more socialized and democratic educational perspective should be adhered to. This promotes the transformation of education towards a more humanized direction, to cultivate well-rounded talents meeting societal needs. These findings provided macro-level theoretical guidance and value orientation for STEAM-based PE teaching design. In the design of PE teaching programs, educators must not only focus on the integration of multidisciplinary knowledge and technological applications, but also consider the holistic development of society and education by cultivating students’ social responsibility, humanistic literacy, and innovative spirit, thereby enabling PE to better serve students’ comprehensive growth and society’s long-term development.

Although the STEAM educational concept and CNN have each achieved certain research progress in the PE field, studies combining these two approaches remain insufficiently explored. STEAM education emphasizes interdisciplinary integration and focuses on developing students’ comprehensive competencies and innovative abilities, while CNN demonstrate significant advantages in image processing and data mining. The combination of these two holds potential for bringing innovative transformations to PE. However, current research in this field remains relatively scarce, primarily due to the complexity of interdisciplinary collaboration and various challenges in technological application. To promote deeper integration between PE and technological innovation, future research needs to strengthen interdisciplinary collaboration, integrate multidisciplinary resources and methodologies, and thoroughly explore practical implementation pathways for STEAM and CNN in PE. This approach can enhance the quality and effectiveness of PE. Also, it injects new vitality into technological innovation in the field, meeting society’s urgent demand for comprehensively developed talents.

Research model

Research theoretical framework

CNN

CNN is a class of feedforward neural networks that include convolutional computation and have a deep structure, representing one of the key algorithms in DL. In the context of PE teaching design, CNN can be applied in several areas:

Action Recognition and Analysis: CNN can be used to recognize and analyze movements in sports teaching videos, extracting key features to provide data support for teaching design31. Student Performance Evaluation: By employing CNN to evaluate students’ movements in real-time, personalized feedback and suggestions can be provided, helping students to better master their skills. Movement Pattern Prediction: CNN can analyze students’ movement data to predict future movement patterns, allowing for the development of targeted teaching plans in advance32. Figure 1 illustrates the structure of a CNN.

Fig. 1
figure 1

Structure of a CNN (a: the overall structure of the CNN, b: the structure of the convolutional layer).

In Fig. 1, the application of CNN in this work is not confined to the technical definition level, but is based on the cognitive construction needs of PE. This work constructs a three-level theoretical framework of “feature representation-cognitive modeling-teaching decision”. From the perspective of technical essence, the design of the convolutional layer of CNN follows the hierarchical representation law of motor skills. The local receptive field mechanism of the 3 × 3 convolution kernel is used to extract primary features such as joint space coordinates and action trajectories. For example, in the analysis of basketball shooting, the spatiotemporal correlation between the wrist flipping angle and the elbow bending arc is captured; the pooling operation simulates the feature abstraction process of the human visual system, and realizes the hierarchical refinement of the action pattern through maximum pooling dimensionality reduction. This hierarchical representation mechanism theoretically coincides with the skill learning law of “decomposition-integration” in PE.

In the PE scenario, the application concept of CNN transcends the technical category of traditional computer vision, evolving into a “data-driven cognitive tool.” The model constructs an action feature library through a transfer learning mechanism (pre-trained ResNet-50), converting standard action paradigms into a computable feature vector space. When a student’s action video is input, CNN generates a deviation heat map through feature comparison, such as the identification of “abnormal force timing” in football passing actions. This technical processing essentially transforms the teaching evaluation criteria of motor skills into algorithmic logic. More importantly, the model deeply couples the feature extraction results of CNN with the interdisciplinary cognitive goals of STEAM education. The joint angle data output by CNN is transformed into scientific inquiry questions through biomechanical modeling (such as “the physical relationship between the release angle and shooting hit rate”). The action defects are transformed into optimization parameters of training equipment through engineering system thinking (such as the angle adjustment range of three-dimensional (3D) printing auxiliary devices). This makes the technical analysis results the cognitive starting point of interdisciplinary teaching.

The innovation of this theoretical framework lies in constructing a mapping mechanism of “technical representation-educational semantics”. In basketball instruction, CNN technology identifies the “shooting release point offset” as a key technical feature. The STEAM integration module transforms this finding into a structured teaching sequence: scientific verification (biomechanical principles) → technical execution (trajectory visualization) → engineering solutions (corrective devices) → mathematical analysis (deviation rate modeling). Ultimately, it creates a self-reinforcing cycle of “technical analysis → cognitive development → practical innovation” for players. This application concept enables CNN to go beyond the positioning of traditional action recognition tools and become a cognitive bridge connecting the representation of motor skills and the essence of interdisciplinary knowledge. Its theoretical value lies in proving that the representation learning ability of DL can form synergy with the cognitive laws of PE. This provides a theoretical basis for intelligent PE that combines technical rigor and educational adaptability.

STEAM concept

The STEAM concept is a comprehensive educational approach that integrates knowledge from five fields: Science, Technology, Engineering, Arts, and Mathematics. It aims to cultivate students’ innovative thinking and problem-solving ability33. In the context of PE teaching design, the application mode of the STEAM concept is as follows:

Interdisciplinary integration: It combines PE with biomechanics, exercise physiology, computer science, and other disciplines to construct comprehensive PE teaching designs. For instance, when teaching basketball shooting techniques, biomechanical knowledge is introduced to analyze the mechanical principles of body movements during shooting, such as the relationship between arm angle, leg thrust force, and shooting accuracy. Exercise physiology knowledge explains the body’s energy metabolism and fatigue recovery mechanisms during movement, enabling students to understand how to properly arrange training intensity and rest periods for optimal results. Computer science motion capture technology records students’ shooting motions, with data analysis identifying movement flaws and providing targeted improvement suggestions, achieving organic integration of multidisciplinary knowledge in PE teaching.

Inquiry-based learning: It encourages students to discover and solve problems through practice, exploration, and collaboration, thereby cultivating innovative thinking and teamwork skills. For example, when organizing football teaching activities, teachers may pose an inquiry question such as “how to improve the scoring rate through optimized team tactics.” Student groups conduct practical explorations by testing different tactics in actual matches while observing and analyzing match data, including pass completion rate, possession time, and shot attempts. This process requires students to collaborate, discuss tactical approaches, analyze match outcomes, and summarize lessons learned to continuously optimize tactical coordination while enhancing problem-solving abilities and team spirit.

Personalized learning: It tailors educational resources and pathways according to students’ interests, strengths, and needs to stimulate learning motivation. Although not unique to the STEAM concept, it finds effective application in STEAM-based PE teaching. For example, for students who are interested in sports science experiments and have strong scientific inquiry abilities, more in-depth sports science experiment projects can be provided for them. It includes investigating how exercise intensity affects heart rate variability (HRV) may be provided, allowing students to design experiments, collect and analyze data independently. For students who are good at applying technology, they develop sports training assistance software by using programming knowledge, such as designing a mobile phone application that records running speed, distance and calorie consumption; Artistically-talented students can analyze the aesthetic qualities of sports movements, such as choreographing expressive cheerleading routines or dance performances, enabling specialized learning in areas of strength and interest to enhance educational outcomes34. Figure 2 displays the STEAM educational concept.

Fig. 2
figure 2

STEAM educational concept.

In Fig. 2, the application of the STEAM concept in this work does not remain at the level of conceptual superposition. Instead, it constructs a 3D application framework of “interdisciplinary integration-cognitive construction-practical innovation” based on the cognitive laws of PE. From the perspective of educational essence, the infiltration of the STEAM concept in the model follows the constructivist learning theory, regarding motor skills as the material carrier of multidisciplinary knowledge. The science dimension reveals the physical mechanism of action execution through biomechanical modeling, such as the mechanical relationship between the parabolic trajectory and release angle in basketball shooting. The technology dimension realizes the visualization and analysis of sports data with digital tools, such as the dynamic modeling of football passing trajectories through Python programming. The engineering dimension optimizes the design of training systems based on action analysis results, such as 3D printing an adjustable-angle shooting auxiliary bracket. The arts dimension extracts aesthetic features from motor coordination, such as the rhythm evaluation of volleyball spiking actions. The mathematics dimension achieves quantitative analysis of skill mastery through statistical modeling, such as establishing a regression model of passing success rate and speed.

In the PE scenario, the application of the STEAM concept breaks through traditional disciplinary boundaries, forming a teaching logic chain of “data-triggered, problem-oriented, and project-driven”. The model creates real problem situations through motion feature data extracted by CNN (such as the clustering results of football players’ running trajectories) to guide students in carrying out hierarchical interdisciplinary inquiries. Students first verify the rationality of actions based on scientific principles, and then use technical tools for data visualization, followed by designing solutions through engineering thinking. Meanwhile, artistic aesthetic evaluations are integrated, thus completing the effect evaluation with mathematical methods. For example, in volleyball teaching, addressing the “hitting point deviation” identified by CNN, students need to comprehensively use sports anatomy (Science) to analyze muscle force patterns. Moreover, they use Unity to develop a virtual simulation system (Technology), design a resistance adjustment device (Engineering), evaluate action fluency (Arts), and establish a correlation model between deviation rate and error rate (Mathematics). Finally, they achieve the collaborative development of motor skills and interdisciplinary literacy in the process of completing integrated projects.

The innovation of this application concept lies in constructing a semantic alignment mechanism of “educational objectives-technical functions”. In the data collection stage, the cognitive objectives of various STEAM disciplines guide the fusion strategy of multi-source data. In the model architecture design, engineering system thinking and mathematical optimization methods ensure the adaptability of technical tools to teaching needs. In the teaching implementation process, the project-based learning framework transforms the STEAM concept into operable cognitive scaffolding. This design makes STEAM no longer an additional element of technical application but the core logic running through the entire life cycle of the model. Its theoretical value lies in proving that interdisciplinary education concepts can form synergy with DL technology, providing an educational paradigm that combines scientificity and innovation for PE.

Model construction

Based on the theoretical foundation outlined above, this work constructs a research model aimed at optimizing PE teaching design using CNN technology. The model utilizes data collection, processing, and analysis to provide scientific guidance and support for PE teaching. Additionally, by integrating the STEAM educational concept, the model aims to promote interdisciplinary learning among students and cultivate their innovative thinking and problem-solving abilities35. The commonly used activation functions in this context are the sigmoid function and the ReLU function. Their equations are as follows:

$$f(x)=\frac{1}{1+{e}^{-x}}$$
(1)
$$f(x)=max(0,x)=\{\begin{array}{c}0,x<0\\ x,x\ge 0\end{array}$$
(2)

In this context, \(x\) represents the computational factor. However, conventional data analysis tasks face challenges due to the large computational load caused by the size constraints of convolutional kernels. This often slows down computer processing speeds significantly and increases the likelihood of overfitting. Therefore, pooling layers play a crucial role in reducing data volume and computational tasks. Common pooling operations include average pooling and max pooling36,37,38,39. Figure 3 illustrates the principle of pooling operations in the model.

Fig. 3
figure 3

Model pooling operations (a: Average Pooling, b: Max Pooling).

Figure 3 illustrates that the application of pooling layers significantly optimizes the computational efficiency of the CNN model40. The fully connected layer serves as the endpoint structure of CNN, typically responsible for classifying image features. It commonly employs a Softmax regression classifier, expressed as:

$${S}_{j}=\frac{{e}^{{z}_{j}}}{\sum_{k}{e}^{{z}_{k}}}$$
(3)

\(k\) represents the number of neurons, and \(j\) denotes a neuron. The calculation for its output is:

$${z}_{j}=\sum wx+b$$
(4)

\(w\) represents the computation weights, and \(b\) represents the bias of the neural network layer41. The working procedure of a CNN model includes forward propagation and backward propagation. Forward propagation refers to the process during network training where the computed results from the previous layer serve as inputs to the subsequent layer. This calculates the classification scores and probabilities for the current class. Backward propagation involves propagating the error output from the output layer backward to the input layer, thereby optimizing the model parameters42. The equation for error calculation is:

$$E(W,b)=\frac{1}{2}\sum_{i=1}^{N}{\parallel {t}_{i}-{y}_{i}\parallel }^{2}$$
(5)

\(t\) represents the true value of the sample, \(i\) denotes the sample index, and \(y\) represents the predicted value of the sample. \(W\) and \(b\) respectively denote the weights and biases of the neural network layer43. The update equations for the two are:

$${W}^{l}={W}^{l}-\eta \frac{\partial }{\partial {W}^{l}}E(W,b)$$
(6)
$${b}^{l}={b}^{l}-\eta \frac{\partial }{\partial {b}^{l}}E(W,b)$$
(7)

\(l\) represents the layer of the neural network, and \(\eta\) denotes the learning rate of the neural network model. Based on this, the model designed can interact effectively with the teaching environment. The model framework combining CNN and STEAM is suggested in Figure 4.

Fig. 4
figure 4

The model framework combining CNN and STEAM.

Figure 4 shows that the constructed CNN–STEAM model takes cognitive constructivism theory as its philosophical foundation, achieving systematic innovation in PE through the dual-layer coupling of technical architecture and educational concepts. In the technical dimension, the model follows the representation learning principle of DL and constructs a three-level computational framework of “spatiotemporal feature hierarchical extraction-cross-modal knowledge fusion-dynamic decision generation”. The underlying data collection layer uses multi-source heterogeneous data fusion technology. It aims to integrate visual features of sports videos (such as joint space coordinates), time-series signals of physiological sensors (such as HRV), and teaching context data (such as classroom interaction logs) into standardized feature vectors. The middle-layer CNN core module enhances action pattern recognition ability through residual connections and attention mechanisms. The local receptive field design of its convolutional layer conforms to the hierarchical representation law of motor skills. The pooling operation simulates the feature abstraction process of the human visual system. The high-level decision-making module maps the feature space to a set of teaching strategies based on the principle of reinforcement learning, forming an end-to-end computational link of “data-feature-decision”.

In the educational dimension, the model takes the interdisciplinary integration theory of STEAM education as the framework; meanwhile, it constructs a five-element teaching paradigm of “scientific principle anchoring-technical tool enabling-engineering thinking modeling-artistic aesthetic immersion-mathematical method quantification”. This paradigm breaks through the skill training limitations of traditional PE and regards motor behavior as the material carrier of multidisciplinary knowledge. At the scientific level, the physical mechanism of action execution is revealed through biomechanical modeling. At the technical level, a virtual simulation environment for motor skills is constructed with digital twins technology. At the engineering level, students are guided to optimize the design of training equipment based on action analysis data. At the artistic level, an evaluation system for action coordination is established through sports aesthetics theory. At the mathematical level, statistical learning methods are used to construct a quantitative model of skill mastery. This integration is not a simple superposition of disciplinary knowledge but transforms the motor feature data generated by CNN analysis into driving questions for interdisciplinary inquiry through the project-based learning (PBL) framework. For example, based on the angle deviation data of basketball shooting actions, an integrated learning task of “mechanical principle verification-training tool development-data visualization analysis” is designed.

The innovative essence of the model lies in constructing a two-way enabling mechanism between technology and education. On the one hand, the representation learning ability of CNN provides an objective motor feature representation system for STEAM education, solving the subjectivity problem in action analysis in traditional PE. On the other hand, the interdisciplinary thinking of STEAM injects semantic constraints of educational scenarios into the DL model. Concurrently, it avoids the disconnection between technical application and educational objectives through teaching goal-oriented feature selection (such as incorporating teaching indicators like action economy into model training). This coupling mechanism is embodied as a “three-layer feedback loop”. At the data layer, the annotated data generated by teaching practice continuously optimizes the training set of the CNN model. At the cognitive layer, the disciplinary knowledge graph generated by STEAM projects guides the feature extraction direction of the model. At the decision layer, the teaching effect evaluation results reversely adjust the strategy generation algorithm of the model. This mechanism enables the model to go beyond the category of traditional technical tools and develop into an intelligent teaching system with educational context adaptability. Its theoretical value lies in breaking through the separation state of technical application and educational concepts in PE and providing a transferable theoretical framework for subject teaching innovation in the era of smart education. Figure 5 illustrates the algorithm design of the model.

Fig. 5
figure 5

The algorithm design of the model.

In Fig. 5, the constructed CNN–STEAM model embeds the STEAM education concept into the technical architecture and practical processes of PE, forming an educationally connotative instructional system design through multi-dimensional integration. At the data level, the model constructs a multi-dimensional dataset guided by interdisciplinary teaching objectives. It collects not only technical parameters such as visual features and movement trajectories of basketball shooting actions but also synchronously acquires physiological indicators like HRV. This data collection paradigm adheres to the “situated cognition” principle in constructivist learning theory; It treats motor behavior as a material carrier of scientific principles (e.g., biomechanics), technical tools (e.g., sensor applications), and mathematical analysis (e.g., trajectory modeling), providing interdisciplinary cognitive materials for teaching design. The model architecture design integrates engineering systems thinking with educational goal constraints. Convolutional layer parameter optimization considers the technical needs of motor feature extraction (e.g., determining the spatial receptive field of 3 × 3 convolution kernels through mathematical derivation). At the same time, it incorporates special requirements of teaching scenarios, such as transforming teaching indicators like “action economy” into loss function constraints of the model. This ensures semantic alignment between the technical architecture and educational objectives.

In the teaching application dimension, the model realizes the implementation of STEAM concepts through a teaching logic chain of “problem-driven, interdisciplinary inquiry, and cognitive construction.” Taking football teaching as an example, the model’s clustering analysis of players’ running trajectories is not merely a technical output. Instead, it is transformed into the teaching problem of “how to improve passing efficiency through mechanical optimization,” guiding students to carry out hierarchical inquiry activities. At the scientific level, sports anatomy knowledge is used to analyze joint force mechanisms. At the technical level, Python programming is employed to visualize trajectory data. At the engineering level, resistance-adjustable passing training devices are designed based on analysis results. At the artistic level, motor aesthetic features are extracted from team running routes. At the mathematical level, regression prediction models for passing success rates are established. This teaching design follows Gagne’s “Nine Events of Instruction” framework, creating real problem scenarios through motor feature data generated by CNN, and constructing cognitive scaffolds with methodological tools from various STEAM disciplines. Ultimately, this enables students to achieve collaborative development of motor skills and comprehensive literacy in the process of completing interdisciplinary projects.

The model training and optimization process also permeates the guiding value of educational theories. Using scientific methods such as cross-validation to evaluate model performance essentially integrates the “positivism” principle in educational research into technical development, ensuring consistency between feature extraction results and teaching objectives. Parameter adjustment of the Adam optimization algorithm not only pursues improvements in technical indicators but also considers the particularity of teaching scenarios. For example, when identifying motor defects in adolescents, adjusting the learning rate to balance model accuracy and fault tolerance reflects the application of the “zone of proximal development” theory in technical design. This design idea of deeply integrating educational theories into the full lifecycle of the model enables the CNN–STEAM model to transcend the category of traditional technical tools. Thus, it forms a closed-loop teaching system of “data collection-feature analysis-instructional intervention-cognitive assessment.” Its essence lies in transforming DL technology into an educationally adaptive cognitive tool through the integration of STEAM concepts, providing a new paradigm for PE that combines technical rigor and educational scientificity.

Experimental design and performance evaluation

Datasets collection

This work constructs an experimental verification system based on the University of Nevada, Las Vegas (UNLV) sports activity dataset. This highlights the practical logic and application paradigm of the CNN–STEAM model in PE through systematic design. Compiled by the University of Las Vegas, the dataset contains 5,000 video sequences of 8 sports, including basketball, football, and tennis. Each video has an average duration of 2 min, collected at a resolution of 1920 × 1080 and a frame rate of 30 frames per second (fps). The annotation system covers 10 basic action categories (such as shooting, passing, and racket swinging) and action start-stop timestamps. Its multimodal data features (visual images + action labels) provide a standardized benchmark for sports recognition research. The cross-sport type coverage of the dataset can effectively verify the model’s generalization ability in different PE scenarios. Meanwhile, its rigorous annotation process (cross-validated by 3 kinematics experts) ensures the reliability of experimental data.

At the model application level, the “Basketball Technology and STEAM Innovation” course for the 2024 class at Guangzhou Sport University is the practical carrier. It aims to construct a closed-loop application model of “technical analysis, teaching transformation, and evaluation feedback”. The course adopts a blended teaching framework, consisting of 16 class hours over 8 weeks. 120 students are randomly divided into an experimental group (CNN–STEAM teaching model) and a control group (traditional teaching method). In the specific implementation process, videos of 6 basic actions, such as shooting and dribbling, are first collected from students via Kinect devices, with resolutions matching the dataset standards. After preprocessing, the videos are input into the CNN–STEAM model, which achieves teaching application through a three-layer architecture. The convolutional layer (3 × 3 kernel) extracts spatial features such as wrist flipping angles and elbow bending arcs. The pooling layer (max pooling) completes feature dimensionality reduction. The fully connected layer, combined with the STEAM integration module, generates multi-dimensional analysis reports. These reports include action technical parameters (e.g., release speed of 12.3 ± 1.5 m/s), associations with biomechanical principles (e.g., the physical model of parabolic trajectories), and engineering design suggestions (e.g., angle adjustment schemes for 3D-printed auxiliary equipment).

In the teaching transformation stage, the project-based learning framework is followed to convert the motion feature data output by the model into interdisciplinary inquiry tasks. For example, based on the “shooting release point offset” issue identified by CNN (28% of samples with a deviation rate > 15%), a “science-technology-engineering-arts-mathematics” integrated project is designed. Students use sports physiology knowledge to analyze the impact of offset on energy consumption (scientific dimension); they employ Python visualization tools (technical dimension) to generate heat maps of action trajectories. Adjustable-angle shooting brackets are designed based on finite element analysis (engineering dimension). Moreover, they evaluate aesthetic features from action fluency (artistic dimension) and establish a logistic regression model of offset versus hit rate (mathematical dimension). During course implementation, the experimental group employs the interactive teaching platform supported by this model. This allows teachers to invoke real-time personalized guidance schemes generated by the model (such as a correction training plan for a student with a + 3° elbow angle deviation). The control group adopts the traditional demonstration-practice-error correction model.

The scientificity of the experimental design is reflected in a threefold verification mechanism. First, at the dataset level, tenfold cross-validation on the UNLV dataset ensures the model’s generalization ability (mean accuracy of basic action recognition: 91.7%). Second, at the teaching practice level, mixed research methods are used to quantitatively collect indicators such as action compliance rate and project completion. Also, student reflection logs and teacher observation records are qualitatively analyzed. Third, at the theoretical level, relying on constructivist learning theory, model analysis data serve as cognitive scaffolds to support students’ ability progression from action imitation to interdisciplinary innovation. This design not only ensures the rigor of technical verification but also achieves a logical closed loop from model construction to teaching application through the deep embedding of specific course cases. Thus, it effectively addresses the practical need for integrating technical tools and educational concepts in PE.

Experimental environment

The construction of the experimental environment is driven by specific research needs. In scientific research, technological development, or product testing, researchers build one or more experimental environments to validate hypotheses, test theories, or evaluate system performance. These environments must meet the specific conditions required for the experiments to ensure accurate results. Table 1 shows the hardware information.

Table 1 Experimental hardware information.

Parameters setting

In scientific research and engineering applications, models are used to simulate the behavior of actual systems or processes and predict their future performance. The design of model parameters aims to ensure that the model accurately simulates the real system, providing more reliable prediction results. Table 2 shows the parameter design results of the model used.

Table 2 Model parameter design results.

Performance evaluation

Basic performance evaluation of the model

In order to investigate the performance of the proposed model, the designed CNN–STEAM model is compared with standard CNN and Residual Network (ResNet) models. This comparison involves statistical analysis to evaluate the performance improvements achieved by the proposed model, thereby highlighting its advantages. Figure 6 shows the results of the basic performance comparison evaluation.

Fig. 6
figure 6

The results of the basic performance comparison evaluation (a) Accuracy; (b) Recall; (c) F1 Score; (d) Response Time.

Figure 6 demonstrates that the model designed significantly outperforms the other two models in terms of basic performance indicators. Specifically, the model’s accuracy improves by over 21%, recall by over 25%, F1 score by over 22%, and response time by over 27%.

Model prediction performance evaluation

Building on the basic performance evaluation, the model is further optimized to enhance its performance more effectively. Following this optimization, the prediction performance of the model is evaluated. Figure 7 presents the results of the prediction performance evaluation.

Fig. 7
figure 7

The results of the prediction performance evaluation (a) Accuracy; (b) Recall; (c) F1 Score; (d) Response Time.

Figure 7 suggests that the prediction performance of the model designed has also improved significantly compared to the other two models. Specifically, the model’s accuracy increases by more than 33%, recall by more than 29%, F1 score by more than 32%, and response time by more than 28%.

Evaluation of the teaching application effect of the model

This work takes the “Basketball Technology and STEAM Innovation” course for 2024-level undergraduates at Guangzhou Sport University as the practical carrier, where the experimental group conducts teaching relying on the CNN–STEAM model. Motion videos of students are collected via Kinect devices (resolution 1920 × 1080, frame rate 30fps). After analysis by the model, comprehensive reports are generated, including action features (such as release angle and wrist flipping amplitude) and interdisciplinary suggestions (biomechanical principles, engineering design schemes). Based on this, STEAM integration projects are designed. The control group adopts the traditional model of “teacher demonstration-student practice-error correction feedback”. The results of the teaching application effect evaluation of the model are presented in Table 3.

Table 3 Evaluation of the teaching application effect of the model.

In Table 3, the CNN–STEAM model significantly improves teaching effects through the “data-driven-interdisciplinary integration” mechanism. The action technology compliance rate increases by 20.6%. This confirms the model’s precise identification ability for action defects (e.g., generating personalized correction schemes by extracting the feature of “shooting release point offset” through CNN and combining biomechanical analysis). The score of interdisciplinary knowledge application ability increases by 46.2%, indicating that STEAM projects effectively promote the integrated application of multidisciplinary knowledge, such as science and technology. The improvement of the innovation of project achievements (33.7%) reflects the enabling role of engineering design thinking and artistic aesthetics in motor skill learning.

Discussion

In the current data-driven research paradigm, data processing and analytical capabilities have become crucial elements driving development across various fields, with PE being no exception. This work focuses on designing an efficient DL model to overcome the limitations of traditional PE data processing and analysis, providing strong support for optimizing PE teaching. The work employs CNN as the foundational architecture and incorporates the STEAM educational concept for customized design, constructing the CNN–STEAM model. Comparative evaluations with traditional CNN and residual network models assess the CNN–STEAM model’s performance from both basic and predictive perspectives, using accuracy, recall, F1 score, and response time as quantitative indicators.

In the basic performance evaluation, the CNN–STEAM model demonstrates significant advantages. Compared to traditional CNN and ResNet models, it achieves over 20% improvement in accuracy, more than 25% enhancement in recall, and over 22% increase in F1 score, while reducing response time by nearly one-third. These results contrast markedly with the performance of traditional models in PE data processing reported in related studies. For instance, Li et al. applied CNN and logistic regression models to predict sports competition outcomes. Although certain achievements have been made in specific tasks, no optimization has been conducted for movement recognition and analysis scenarios in PE teaching. Park et al. researched sports text named entity recognition using character-level graph convolutional network and self-attention mechanism models, differing in research focus from this work and not involving PE teaching applications. The proposed CNN–STEAM model undergoes specific optimization for movement analysis scenarios in PE teaching. This enables more precise and efficient processing of movement data to provide timely and accurate support for teaching decisions.

After further optimization, the evaluation of the model’s predictive performance reveals that the CNN–STEAM model achieves remarkable improvements. The optimized model demonstrates over 33% increase in accuracy, more than 29% growth in recall, and over 32% enhancement in F1 score, with further reduction in response time. These results indicate that the CNN–STEAM model excels in current motion recognition and analysis tasks. Also, it possesses strong predictive capabilities to anticipate students’ movement development trends, providing substantial support for personalized teaching plan formulation. Compared to the sports health monitoring system integrating multiple technologies proposed by Abdelbaky and Aly, the proposed model focuses more specifically on motion analysis and teaching strategy optimization during PE teaching, demonstrating greater applicability in educational scenarios.

However, the proposed CNN–STEAM model still presents certain limitations. When processing specific types of data, particularly those with complex features or limited quantities, the model’s generalization capability and robustness require further improvement. Future research could adopt approaches similar to Chen et al.'s study combining cross-age peer tutoring with STEAM projects. Meanwhile, it explores diversified data augmentation strategies and model optimization methods to enhance the model’s adaptability in complex data environments. Additionally, considering the critical importance of data privacy and security in education, subsequent studies need to emphasize strengthening the model’s data privacy protection capabilities to ensure compliance and security in data usage.

To sum up, the proposed CNN–STEAM model exhibits remarkable advantages in the processing and analysis of PE data. It provides new tools and methods for researchers and technicians in the PE field, effectively promotes PE development, and opens up a new direction for applying DL in this field. However, there is still considerable research space in aspects such as model performance optimization and privacy protection, which is worthy of further in-depth exploration.

Conclusion

Research contribution

The main contribution of this work is the successful design and validation of a novel CNN–STEAM model, which demonstrates exceptional performance in data processing and analysis tasks within the field of PE. Compared with traditional CNN and ResNet models, the CNN–STEAM model achieves significant improvements in both basic and prediction performance. Specifically, it achieves over 20% growth in key indicators such as accuracy, recall, and F1 score, while also effectively reducing response time. This work not only provides robust technical support for data processing in PE but also opens new avenues for the application of DL in this field, offering substantial theoretical and practical significance.

Future works and research limitations

Despite the significant achievements, there are still some limitations and areas for improvement. Future work will focus on optimizing the structure and parameters of the CNN–STEAM model to further enhance its performance. Additionally, it will explore the model’s applicability to a broader range of tasks in PE. Efforts will also be made to improve the model’s generalization ability and robustness to handle more complex and variable data environments. Finally, considering data privacy and security issues, there will be an increased focus on enhancing the model’s privacy protection capabilities to ensure compliance and safety in data usage.