Introduction

Running frequently results in lower limb bone stress injuries due to its repetitive, weight-bearing nature. Epidemiological evidence suggests that annually, runners face injury incidence rates ranging between 24 and 77%1. Notably, stress fractures in the metatarsal bones, while accounting for up to 4% of all sports-related injuries2, are among the most common types of fractures observed in runners due to the repetitive impact on the forefoot during running1.

During running, the mid- to forefoot experiences high stress, particularly from mid-stance to push-off phases. Morphologically, metatarsal bones are characterized by their cylindrical shape and relatively thinner structure compared to other bones. This anatomical design is effective in facilitating the windlass mechanism, enhancing stability, and attenuating shock impact during loading3. However, this structural configuration also predisposes them to increased fracture risk. Mechanical loading on bones results in the accumulation of microdamage. A stress fracture occurs when a bone, subjected to high-stress magnitudes, cannot promptly repair this microstructural damage under intense cyclic loads4,5.

While bone staple strain gauges can quantify the stress experienced by bones, including the metatarsals6,7 and tibia8,9,10, their invasive nature makes them impractical for assessing and monitoring internal bone loading during running. Instead, mathematical and biomechanical modeling provides a feasible alternative for in silico load estimation. Beam theory-based reconstructions of bone models have been employed to evaluate stress in the second metatarsal11 and tibia12. However, these geometric models may oversimplify the mechanical complexities present in real-world scenarios. They often overlook shape deformations under stress and the interactions between soft tissue and bone13,14.

Conversely, surrogate finite element (FE) simulation provides a potentially more accurate biomechanical model for foot stress analysis in running, albeit being computationally intensive and time-consuming, particularly when developing models based on personalized geometries. This highlights a critical challenge in achieving both accuracy and efficiency in biomechanical modeling. Consequently, relevant research often faces a dilemma: either simplifying geometries or conducting case studies. This leads to a trade-off between developing a comprehensive FE model13,14 and undertaking pilot studies with limited statistical significance15. To overcome this limitation, the use of advanced techniques for high-fidelity and personalized reconstruction of lower limb musculoskeletal anatomy is promising16. This can be achieved by leveraging low-fidelity real-world measurements, such as those from inertial sensors or smartphone cameras, and integrating them with insights derived from neuromusculoskeletal simulations17.

In recent decades, data-driven approaches have gained significant popularity, largely due to advancements in computational capabilities and the refinement of machine learning algorithms. Deep learning algorithms, in particular, are increasingly favored in the field of running biomechanics, reflecting a shift from traditional laboratory-based investigations to real-world estimations17,18. The integration of various wearable sensors facilitates real-time evaluation, circumventing the need for conventional and cumbersome feature engineering processes. This approach has been applied in running to classify gait characteristics such as performance level19, foot posture20, and strike pattern21,22, and to predict sequential data, including metabolic energy expenditure23, ground reaction force24,25, joint kinematics26,27,28, and joint kinetics29,30,31.

In this context, one-dimensional convolutional neural network (CNN1D) and long short-term memory (LSTM) architectures have demonstrated strong capabilities in predicting time-series data for human activity recognition32,33, gait analysis30,34, and running biomechanics24,35. Temporal convolutional networks (TCN) are designed to extract temporal features for sequential data prediction and have proven highly effective in predict lower limb movements34. Recently, transformers, known for their use in natural language processing tasks such as machine translation and text generation, have been adapted for biomechanical applications36. Central to these transformers is the attention mechanism (AM), which has been integrated into bidirectional-LSTM architectures to estimate lower extremity kinematics during running, yielding promising results28.

Despite the considerable potential of data-driven methods in this field, most existing models focus on predicting external forces (e.g., ground reaction forces) or joint kinetics, which, while useful, do not fully capture internal mechanical stresses within bones. Overuse injuries, such as stress fractures, are primarily driven by repetitive internal bone stresses, which may not be directly inferred from joint kinetics or external forces alone. Previous studies have shown that external load metrics often exhibit weak correlations with internal tibial bone stress37,38, highlighting the need for direct stress estimations at the bone level. Understanding bone stresses in vivo is crucial for injury prevention, particularly in high-impact activities such as running, where excessive localized stress accumulation can lead to microdamage and stress fractures.

This study aims to bridge this gap by developing and validating a novel digital twin framework for predicting metatarsal bone stresses. In this study, a digital twin refers to a computational model that dynamically mirrors the biomechanical behavior of an individual’s foot during running, integrating real-time sensor data with subject-specific anatomical and mechanical properties to predict internal bone stresses39,40. The framework integrates personalized FE models, informed by statistical shape modeling (SSM) and free-form deformation (FFD) techniques, with deep learning predictions. The model is trained using wearable sensor accelerations as inputs and FE-predicted bone stresses as outputs. To comprehensively validate the framework, we evaluate its predictive performance across different foot strike patterns, specifically comparing rearfoot and non-rearfoot strikers. This approach enables precise estimation of stresses on key foot bones—specifically the metatarsals, talus, and calcaneus—thereby advancing the field of running biomechanics and contributing to more effective injury prevention strategies through the use of digital twin technology.

Results

The statistical analysis showed no significant differences in von Mises stress levels between rearfoot and non-rearfoot strikers during the mid-stance (Fig. 1a) and push-off (Fig. 1b) phases. Comparisons of the performance across the six selected models are presented in Supplementary Table 1, with LSTM + MLP exhibiting the highest accuracy. The model’s interpretability with different IMU sensor inputs is presented in Supplementary Fig. 1. The FE simulation results for the vertical compression-deformation relationship fell within the standard deviation of the experimental measurements, confirming the model’s validity. Table 1 details the accuracies of the predicted mean and peak von Mises stresses. During the mid-stance phase, the calcaneus (0.47 ± 0.29 MPa) and talus (1.14 ± 0.66 MPa) demonstrated lower prediction accuracy for mean bone stress, as measured by root mean squared error (RMSE), compared to M1-M5 (1.24 ± 0.77 MPa, 1.88 ± 1.13 MPa, 1.79 ± 1.07 MPa, 1.85 ± 1.11 MPa, and 2.16 ± 1.30 MPa, respectively). However, the mean MAPE was higher for the calcaneus and talus at 22.12%, compared to 16.17% for M1-M5. Predicted peak stresses for the calcaneus and talus showed slightly better accuracy in terms of MAE, RMSE, and MAPE (1.24 MPa, 1.30 MPa, and 8.77% respectively) compared to M1-M5 (3.79 MPa, 4.14 MPa, and 11.45% respectively).

Fig. 1: Von Mises Stress Distribution in Rearfoot and Non-Rearfoot Strikers.
figure 1

Comparison of Von Mises stress between rearfoot (blue) and non-rearfoot (red) strikers for each foot bone during the midstance (a) and push-off (b) phases. Note: M1–M5 represent the first to fifth metatarsals.

Table 1 Evaluation of the accuracy of mean and peak von Mises stress predictions during the mid-stance and push-off phases

The analysis showed that the peak bone stress accuracy, in terms of percentage error, generally surpassed that of mean bone pressure, with a significance level of p < 0.05 (see Table 1 and Fig. 2). The mean MAE and MAPE for peak stresses in M1-M5 during the push-off phase were 6.56 MPa and 11.45%, respectively. Figure 3 illustrates the Pearson correlation coefficient (r) and Bland-Altman plots comparing predicted stresses with reference von Mises stresses obtained from FE modeling.

Fig. 2: Violin Plots of MAPE for Predicted Stresses During Gait Phases.
figure 2

Comparison of MAPE for predicted mean and peak stresses in each region during the mid-stance (a) and push-off (b) phases. Note: MAPE mean absolute percentage error, M1–M5 represent the first to fifth metatarsals.

Fig. 3: Pearson correlation coefficient (r) plot (left) and Bland-Altman plot (right) compare predicted stresses with reference stresses obtained from finite element modeling during the mid-stance (in blue) and push-off (in purple) phases.
figure 3

Labels (ae, hl) represent the first to fifth metatarsals, while (f, g) denote the calcaneus and talus.

Figure 4 depicts the comparison of prediction accuracy for rearfoot and non-rearfoot strikers during the mid-stance and push-off phases, using RMSE, r, and Bland-Altman plots. The results indicated consistency between the two groups, except for the von Mises stress in M1 and M4 (p = 0.005 and 0.03, respectively) during the mid-stance phase and for stresses in M2 and M3 (p = 0.026 and 0.049, respectively) during the push-off phase, where the non-rearfoot group presented smaller errors.

Fig. 4: Comparison of RMSE between rearfoot and non-rearfoot strike runners.
figure 4

ac Mid-stance phase, while df push-off phase. a, b, d, e Pearson correlation coefficients and Bland–Altman plots. c, f Box plots comparing the groups. Note: RMSE root mean square error, and M1–M5 indicate the first to fifth metatarsals. *p < 0.05, and **p < 0.01.

Discussion

The study presents a novel approach for predicting foot bone stress during running, utilizing wearable sensors combined with a domain adaptation-based LSTM algorithm. The stress data were derived from FE simulations, with models generated from foot scans coupled with FFD-based SSM. The findings showed promising results in predicting stresses in M1-M5, calcaneus, and talus during the mid-stance phase, as well as in M1-M5 during the push-off phase. Notably, the study achieved foot stress evaluation using low-cost and convenient sensors and scanners, underscoring its potential for future implementation.

Previous studies often simplified computational models11,12,13,14 or limited participant numbers15,41,42, potentially compromising statistical significance due to computational cost considerations. This study overcomes these limitations by reconstructing surrogate FE models for bone stress evaluation, utilizing comprehensive volumetric models based on detailed, person-specific geometries on foot and ankle joints without reducing the sample size. This was achieved through SSM coupled with FFD, as proposed and validated in a previous study43. Specifically, a 3D personalized foot-ankle model was built via SSM generation of the foot surface, which informs bone reconstruction based on FFD. The in-silico simulation results presented in this study are consistent with previous findings regarding second metatarsal stress14.

Research in biomechanics and sports medicine has sought to identify thresholds of bone stress that, when exceeded, increase the risk of injury, particularly stress fractures. These thresholds are often linked to repetitive loading cycles that exceed the bone’s capacity for repair, leading to microdamage accumulation4,44. In running biomechanics, metatarsal stress fractures are more likely when peak bone stress during repetitive impact loading consistently surpasses critical values, which can vary based on factors such as bone density, strain rate, and physical condition5. However, the precise identification of harmful bone stress thresholds remains an ongoing challenge4,5. Our model contributes to this effort by enabling real-time monitoring and prediction of bone stress levels, potentially establishing more personalized and accurate thresholds for runners. By accurately estimating bone stresses, these models can inform personalized training programs, reduce the risk of stress fractures, and enhance rehabilitation protocols through precise stress level control.

CNN, LSTM, and TCN have demonstrated strong capabilities in predicting biomechanical variables from time-series data, such as acceleration or joint kinematics18,31,45. For instance, kinematic and kinetic parameters have been investigated to predict tibial stress fracture in running46,47. LSTM, in particular, excels at capturing long-term dependencies in time-series data, outperforming self-attention-based transformers. This study demonstrated that integrating an LSTM-MLP model with domain adaptation-based transfer learning can enhance prediction performance. The proposed model accurately predicted peak stresses during the mid-stance phase of running, with an RMSE of 3.33 ± 1.5 and an r of 0.83 ± 0.04, and during the push-off phase, with an RMSE of 7.19 ± 1.17 and an r of 0.84 ± 0.02. Prior studies have focused on predicting joint contact forces to better understand bone stress variations during running31,48,49. Building on this foundation, the present study provides a significant and timely contribution to the existing body of evidence16,17,50,51 by demonstrating the potential of internal bone stress monitoring for predicting overuse running injuries.

This study demonstrated that the deep learning model was more accurate at predicting peak stresses than mean stresses, as evidenced by lower RMSE and MAPE values. This indicates that the proposed pipeline is particularly adept at detecting peak loading conditions on bones, rather than average loads. Zandbergen et al.37 reported no strong correlation between acceleration and internal tibial bone loads, while ground reaction force features also showed weak associations with tibial loads during running38. Together, these prior findings and our model’s superior accuracy in predicting peak stresses underscore the importance of peak characteristics in training data-driven algorithms to estimate internal bone loads. This aligns with fatigue failure theory, which identifies repeated peak stresses—rather than average loads—as primary contributors to microdamage accumulation and injury (e.g., stress fractures) under cyclic loading52. The proposed model’s ability to capture the relationship between acceleration and bone stress, a task where traditional methods often struggle, further highlights the utility of peak-focused approaches for injury risk prediction.

Physics-based methods often require significant simplifications, therefore reducing its reliability53. Our use of an ML model is justified by its ability to process large datasets and provide real-time predictions, crucial for practical, scalable applications in running biomechanics. This study underscores the strong generalizability of the proposed data-driven model for both rearfoot and non-rearfoot strikers. Different strike patterns exhibit varying biomechanical characteristics54, suggesting unique biodynamic adaptations for each. This study found that no statistical difference in peak stresses between rearfoot and non-rearfoot strike runners, which is consistent with previous findings11,14. However, ground reaction forces may differ between these cohorts, underlining the crucial role of personalized foot geometries in stress simulation. Internal loading, possibly adapted through mechanobiology, may not be accurately represented by external forces alone during running55. This gap is what machine learning technology or data-driven learning aims to bridge, as shown in this study and other recent studies31,56. Previous studies have utilized various data-driven approaches to project knee57 and ankle31 joint moments and contact forces, tibial bone loading56, and Achilles tendon stress58. To our knowledge, this is the first study to employ transfer learning-based deep learning with wearable technology for predicting internal foot loading. Our approach can be seamlessly integrated into current wearable sensor-based biomechanical assessments, offering a scalable solution that enhances personalized injury prevention and management, while paving the way for future research to explore its applicability across diverse populations and sports activities.

In our correlation analysis, we observed a systematic trend in the residuals, suggesting that the linear relationship may not fully capture the underlying data structure. This pattern indicates that the variance of the residuals might increase or decrease with the predicted values, meaning that the errors are not consistent across all levels of the independent variable. The presence of these trends underscores the importance of considering both correlation and agreement between predicted and actual values when evaluating model performance. This systematic bias could potentially lower the overall mean difference, which suggests that while the Pearson correlation indicates a strong linear relationship, there may be underlying issues with the model’s accuracy across the full range of data59. The observed bias highlights areas where the model could be refined, particularly in its ability to predict extreme values. Future work should focus on improving the model’s calibration across the entire data range, possibly by incorporating non-linear modeling techniques or adjusting the model to better account for the variability observed in the residuals.

Although this study presents promising findings, it also has limitations. As this study derived loading conditions from plantar pressure during barefoot running, future studies should incorporate footwear as a covariate to enhance the model’s applicability. Furthermore, training foot shape models under various weight-bearing conditions, such as different gait patterns and load distributions, could be beneficial, as it would introduce more variations into the dataset, potentially leading to more accurate and generalized models60. Future studies should also evaluate the performance and generalizability of the proposed approach in female runners to account for gender-specific biomechanical differences and broaden the applicability of the findings. Additionally, the running stance phases were simulated using three quasi-static models, which might oversimplify the FE models. Explicit modeling may more accurately represent foot-ankle biodynamics in future studies.

In summary, this study presents a cutting-edge predictive model for foot bone stress that leverages wearable sensors and LSTM with domain adaptation. The model offers a cost-effective and innovative alternative to traditional biomechanical analyses. Utilizing personalized 3D foot models, our approach achieves high accuracy in predicting foot bone stress during the stance phase, crucial for preventing injuries among runners. The model’s effectiveness across various running styles highlights its potential for personalized assessments. Despite its strengths, the study’s limitations underscore the need for further validation across a more diverse demographic. Overall, our findings represent a significant advancement in integrating machine learning with running biomechanics and clinical practice. This work contributes to digital health by providing accessible, data-driven insights for injury prevention in running, enhancing the potential for personalized healthcare solutions.

Methods

Participants

Following recommendations from a prior evidence-based study18, we recruited fifty male participants, comprising 38 rearfoot and 12 non-rearfoot strikers (age: 22.7 ± 3.9 years; height: 1.76 ± 0.06 m; mass: 67.7 ± 9.6 kg; BMI: 21.8 ± 2.7 kg/m2). Recruitment was facilitated via social media and by distributing posters in universities and running clubs. All participants in the study engaged in recreational running and maintained a minimum weekly mileage of 20 km. None had experienced musculoskeletal injuries in the lower limbs in the preceding six months. Participants were free to withdraw from the study at any time without providing a reason. In accordance with the Declaration of Helsinki, this study was approved by the ethics committee of Ningbo University (RAGH20201137), and written informed consent was obtained from all participants prior to the commencement of the experiments.

Data acquisition

Figure 5 provides an overview of the study’s flow diagram. Participants were given 10–15 min to warm up, which included running on a treadmill at a self-selected pace for a few minutes, followed by stretching exercises, familiarizing themselves with the experimental setting, and calibrating their running pace. Plantar pressure was captured using a Footscan® plate system (RSscan International, Belgium, frequency: 350 Hz), securely positioned at the center of a pathway. The design of the plate, with consistent surrounding dimensions, ensured optimal running comfort. Plantar pressure data were gathered using the mid-gait protocol61, in which participants ran at a self-selected speed. The starting position was adjusted to ensure accurate placement of the right foot on the sensor platform during the fourth step. The strike index, indicating a rearfoot striking pattern, was determined when the center of pressure was within 0–0.33 of the foot length at initial contact62. Additionally, two nine-axial inertial sensors (IMeasureU V1, New Zealand, frequency: 100 Hz, weight: 12 g) were affixed to the foot dorsum and distal anteromedial tibia. These sensors recorded three-axis acceleration data, which were synchronized with the plantar pressure measurements using a gait event detection algorithm63. Barefoot running was chosen to eliminate the influence of footwear on plantar pressure measurements and allow for a direct assessment of internal foot loading. Three appropriate trials were selected from the right foot (dominant side) based on consistent self-selected pace, confirmed foot strike pattern, and high data integrity, with complete and accurate plantar pressure and acceleration data.

Fig. 5: A hybrid biomechanical modeling approach coupled with data-driven methods predicts von Mises stress in the foot bones from inertial sensors during running.
figure 5

a Acquisition of sensor data from the foot and ankle joint; b Generation of foot-ankle models from foot scans, informed by statistical shape modeling (SSM) coupled with free-form deformation (FFD); c Application of a data-driven approach, using inertial sensor data as inputs and bone stress as outputs; d Finite element simulation projects bone stress during the mid-stance and push-off phases of running. Note: IMU inertial measurement unit, SSM statistical shape modeling, PC principal component, FFD free-form deformation.

An Easy-Foot-Scan machine (OrthoBaltic, Lithuania) was utilized to capture the 3D surface contours of participants’ feet. During the scanning process, participants were instructed to stand still with their feet positioned shoulder-width apart, distributing their weight evenly between both legs, while placing the right foot on the scanner’s surface. This procedure adhered to a pre-established methodology64. The scanned foot surface data served as input for a pipeline combining SSM and FFD, as detailed in Xiang et al.43. A pre-trained statistical shape model, based on principal component analysis (PCA) with skin measurements as inputs, was employed to generate subject-specific foot surfaces. These surfaces were then used to reconstruct internal bone meshes via FFD. A summary diagram has been included in Supplementary Fig. 2 to further elucidate the process.

Foot and ankle joint computational modeling

Figure 6 illustrates the FE modeling and validation process. The following steps were taken to generate 3D meshes from 2D geometries in HyperMesh 2020 (Altair Engineering, Inc., Troy, USA): the soft tissue and inner bone surface were initially meshed using triangular elements (size: 3 mm). A volume Boolean operation was then performed to capture the encapsulated soft tissue geometry, which was meshed using tetrahedral elements (C3D4). Subsequently, bones were meshed as tetrahedral elements. A mesh convergence test was conducted to minimize discretization error and determine the appropriate mesh size. Mesh sizes tested ranged from 5 mm to 1 mm in 10% intervals65. The nodes and elements in these models varied between 53,000 to 56,000 and 273,000 to 278,000, respectively. The meshes were assembled in Abaqus 2022 (Simulia, Dassault Systèmes, USA). Models representing quasi-static gait phases were then simulated and solved.

Fig. 6: Finite element modeling and validation.
figure 6

a Geometry acquisition from foot scanning and reconstruction, informed by statistical shape modeling (SSM) coupled with free-form deformation (FFD); b Finite element simulation for the initial contact, mid-stance, and push-off phases during running, employing plantar pressure as the loading condition; c Model validation was performed by comparing vertical compression to displacement and by comparing plantar pressure during standing across five reconstructed models with experimental measurements.

The materials used to reconstruct the FE models were assumed to be homogeneous, isotropic, and linearly elastic (see Table 2). The Young’s modulus I and Poisson’s ratio (v) values were set at 7300 MPa and 0.3 for bones66, and 1.15 MPa and 0.45 for soft tissue41, respectively. Slip ring connectors were employed to represent plantar fascias, connecting and gliding between the medial and lateral processes of the calcaneal tuberosity and the base of the metatarsals42. The Achilles tendon force was calculated in OpenSim67 to represent the cumulative force from the soleus, medial, and lateral gastrocnemius muscles. Axial forces were applied to the posterior tuberosity of the calcaneus to simulate this Achilles tendon force. During simulations, the encapsulated soft tissue and bones were bound together using a tie constraint.

Table 2 Material properties representation for different parts in foot-ankle models

Plantar pressure was applied directly to the FE models as the loading condition. Plantar pressure during the initial contact, mid-stance, and push-off phases of the stance was measured separately from the toes, forefoot, midfoot, and rearfoot regions, as defined in the Footscan® software (Gait v7.0, RSScan International, Belgium). The boundary condition was set by fixing the 3D displacement of the proximal top of the models. The incremental effect was set to imitate the cumulative impact of gait following the initial contact and mid-stance phases. The FE simulation models were validated by comparing the vertical compression-deformation relationship and peak plantar pressure during standing (0.136 ± 0.01 MPa vs. 0.143 ± 0.01 MPa) between the simulation results from five participants and experimental measurements reported in a previous study68, as depicted in Fig. 6c.

Data-driven approaches

Three-axial acceleration data were normalized and padded to 300 data points, allocating 100 points for each phase: initial contact, mid-stance, and push-off. Consequently, the input features were defined as:

$${x}_{t}={[{x}_{1}\left(t\right),\ldots ,{x}_{i}\left(t\right),\ldots ,{x}_{n}(t)]}^{T}\in {R}^{3* n}$$
(1)

where \(x(t)\) is the concatenated vector at time step \(t\), \(T\) indicates the transpose operation applied to the vector \(x(t)\), and \({R}^{3* n}\) denotes a 3n-dimensional real space.

In this study, the number of sensors (n) was 2, and the output features \({y}_{t}\in \,{R}^{7+5}\). The response features comprised mean and maximum values of von Mises stresses for the calcaneus and talus at initial contact, the first to fifth metatarsals, calcaneus, and talus during the mid-stance phase, and the first to fifth metatarsals during the push-off phase, summing up to 28 features in total. Each feature from the input and output data was standardized by removing the mean and scaling to unit variance independently69. The mean and standard deviation, computed from the training/validation set, were utilized for centering and scaling the corresponding testing set to prevent data leakage.

The models were trained participant-wise and validated using a leave-one-out cross-validation approach. This practice ensures that data from the test set is not exposed during training, thereby guaranteeing robust generalization of the model.

For this investigation, six scenarios were designed for model training: TCN, CNN, LSTM, CNN + LSTM, LSTM + MLP (multilayer perceptron networks), and LSTM + AM. The selected architectures were chosen for their established effectiveness in biomechanical time-series prediction, computational efficiency, and suitability for short, high-frequency wearable sensor data. The formula for the forget gate in the LSTM model is expressed as:

$${f}_{t}=\sigma \left({W}_{f}\left[{h}_{t-1},{x}_{t}\right]\right)+{b}_{f}$$
(2)

where \(\sigma\) represents the sigmoid function, \({x}_{t}\) is the input, \({h}_{t-1}\) is the previous hidden state, \({W}_{f}\) is the weight matrix between the forget gate and input gate, and \({b}_{f}\) is the connection bias at time step \(t\). The equations for the cell state \({C}_{t}\) and hidden state \({h}_{t}\) are given by:

$${C}_{t}={f}_{t}{C}_{t-1}+{i}_{t}\,{\widetilde{C}}_{{t}}$$
(3)
$${h}_{t}={o}_{t}\mathrm{tanh}({C}_{t})$$
(4)

where \({i}_{t}\) is the input gate, \({\widetilde{C}}_{t}\) denotes the candidate for the cell state at time step \(t\), and \({o}_{t}\) represents the output gate.

A supervised domain adaptation algorithm was implemented to extract domain-invariant biomechanical features, enhancing model generalization and performance by designating the trained model as the source domain and the test dataset as the target domain70. In this framework, label prediction corresponds to estimating bone stress values, while domain classification helps distinguish data distributions between the source and target domains. The optimization follows an adversarial training strategy, where the model minimizes the task-specific loss (bone stress prediction) while maximizing the domain classification loss, thereby improving generalization across different subjects and running conditions. To achieve this, a Gradient Reversal Layer was integrated allowing the model to learn domain-invariant features that cannot be easily distinguished by the domain classifier. The objective function was optimized by identifying the saddle point:

$$E\left({\theta }_{f},{\theta }_{y},{\theta }_{d}\right)=\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{L}_{y}^{i}\left({\theta }_{f},{\theta }_{y}\right)-\lambda \left(\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{L}_{d}^{i}\left({\theta }_{f},{\theta }_{d}\right)+\frac{1}{{n}^{{\prime} }}\mathop{\sum }\limits_{i=n+1}^{n}{L}_{d}^{i}\left({\theta }_{f},{\theta }_{d}\right)\right.$$
(5)

where \({L}_{y}^{i}\) and \({L}_{d}^{i}\) represent the loss functions for label prediction and domain classification, respectively, estimated for the i-th training example. The architecture of the domain adaptation-based LSTM is depicted in Fig. 7. The deep learning models were developed using the Tensorflow framework (2.5.0) and trained on an Nvidia Tesla A100 GPU with 80 gigabytes of memory.

Fig. 7: Architecture of the proposed domain adaptation-based LSTM.
figure 7

a Illustration of bidirectional-LSTM; b Use of a gradient reversal layer to distinguish domain-invariant features; c Demonstration of a single LSTM unit. Note: LSTM long short-term memory.

Hyperparameter tuning was conducted using Optuna 3.3.0, leveraging Bayesian hyperparameter optimization with the Tree-Structured Parzen Estimator algorithm. To enhance computing efficiency, each scenario underwent 100 trials, with early pruning of unpromising ones. The sampler initially operates as a random sampler, recording hyperparameter settings and objective values from previous trials. It then suggests hyperparameter values for subsequent trials based on the past promising past results. Hyperparameters, including the optimizer, learning rate, activation function, drop rate, epochs, batch size, convolutional layers, MLP layers, LSTM units, numbers of TCN filters, TCN dilations, convolutional kernel size, and dense units, were fine-tuned to achieve optimal solutions across different scenarios. The results for hyperparameter tuning are shown in Supplementary Table 2. Attention mechanisms were employed to enhance interpretability by identifying the most critical features in the sensor data for stress prediction. Specifically, weights from the bidirectional LSTM were fed into a self-attention layer, and the resulting scores were normalized and visualized. Evaluation metrics, including the Pearson’s product-moment correlation coefficient (r), mean squared error, RMSE, mean error (ME), mean absolute error (MAE) and mean absolute percentage error (MAPE), were calculated to estimate the prediction performance by comparing the FE-derived reference values with the predictive values on the test sets.

Statistical analysis

An independent t-test was conducted to assess the statistical differences in peak von Mises stress between rearfoot and non-rearfoot strikers. This comparison was included to validate the framework’s ability to detect biomechanically meaningful differences between distinct gait patterns. Additionally, we compared the predictive accuracy of peak stresses between rearfoot and non-rearfoot strikers using RMSE. When the Shapiro-Wilk test indicated a violation of the normality assumption, the Wilcoxon rank-sum test was employed for statistical analysis. The significance level was set at p < 0.05.

The performance of the predictive model was evaluated by calculating Pearson’s correlation coefficient (r) to assess the strength of the linear relationship between predicted and actual bone stresses, and by generating Bland-Altman plots to examine the agreement between these values. The Bland-Altman analysis involved calculating the mean difference and the limits of agreement (mean difference ± 1.96 SD) to identify any systematic errors or trends across the range of observed data.