A deep learning algorithm for automatic 3D segmentation and quantification of hamstrings musculotendon injury from MRI

Riem, Lara; DuCharme, Olivia; Coggins, Ashley; Kenney, Allison; Cousins, Matthew; Feng, Xue; Hein, Raina; Buford, Max; Lee, Kenneth; Opar, David; Heiderscheit, Bryan; Blemker, Silvia S.

doi:10.1038/s41598-025-16926-1

Download PDF

Article
Open access
Published: 29 September 2025

A deep learning algorithm for automatic 3D segmentation and quantification of hamstrings musculotendon injury from MRI

Lara Riem¹^na1,
Olivia DuCharme¹^na1,
Ashley Coggins¹,
Allison Kenney¹,
Matthew Cousins¹,
Xue Feng¹,
Raina Hein¹,
Max Buford¹,
Kenneth Lee³,
David Opar²,
Bryan Heiderscheit³ &
…
Silvia S. Blemker^1,4

Scientific Reports volume 15, Article number: 33497 (2025) Cite this article

2698 Accesses
Metrics details

Subjects

Abstract

In high-velocity sports, hamstring strain injuries are common causes of missed play and have high rates of reinjury. Evaluating the severity and location of a hamstring strain injury, currently graded by a clinician using a semiqualitative muscle injury classification score (e.g. as one method, British Athletics Muscle Injury Classification - BAMIC) to describe edema presence and location, aids in guiding athlete recovery. In this study, automated artificial intelligence (AI) models were developed and deployed to automatically segment edema, hamstring muscle and tendon structures using T2-weighted and T1-weighted magnetic resonance images (MRI), respectively. MR scans were collected from collegiate football athletes at time-of-hamstring injury and return to sport. Volume, length, and cross-sectional (CSA) measurements were performed on all structures and subregions (i.e. free tendon and aponeurosis). The edema and hamstring muscle/tendon AI models compared favorably with ground-truth segmentations. AI volumetric output correlated with ground truth for edema (R = 0.97), hamstring muscles (R ≥ 0.99), and hamstring tendon (R ≥ 0.42) structures. Edema volume and percentage of muscle impacted by edema significantly increased with clinical BAMIC grade (p < 0.05). Taken together, these results demonstrate a promising new approach for AI-based quantification of edema which reflects differing levels of injury severity and supports clinical validity. Main Body.

Specificity of eccentric hamstring training and the lack of consistency between strength assessments using conventional test devices

Article Open access 28 June 2021

AI-driven medical image analysis for sports injury diagnosis and prevention

Article Open access 23 November 2025

Evaluation of various traditional machine learning techniques for predicting the acute effect of different hamstring muscle stretching methods among male soccer players

Article Open access 04 December 2025

Introduction

Hamstring strains account for 10% of field-sport injuries¹with higher rates in high school and collegiate football players². In the National Football League (NFL), 40% of athletes experience lower extremity injuries, with hamstring strains being the most common³. These injuries result in significant training and competition time loss, and hamstring re-injuries occurring at high rates⁴. Currently, assessments of injury severity are limited to qualitative estimates of edema size and location from magnetic resonance imaging (MRI)^5,6,7,8. Because these approaches rely heavily on subjective interpretation, there is a high variability in diagnoses, treatment planning, and outcomes of hamstring injury management.

Given the limitations of subjective MRI assessments, there is growing interest in developing quantitative MRI-based metrics to enhance the accuracy and predictive power of hamstring injury assessments. These metrics include measurements such as T2 mapping⁹tissue diffusivity¹⁰and edema volume¹¹ at the time of injury. However, obtaining these metrics requires manual segmentation of edema, tendon, and muscle—a process that is time-intensive, costly, and user-dependent. These constraints hinder the feasibility of large-scale studies that are needed to test their predictive capacity, and ultimately limit the clinical viability of quantitative MRI assessments.

This study introduces an automated artificial intelligence (AI) method designed to automatically quantify hamstring structures and edema in 3D from MR images. Two AI models were developed and validated: the first segments all four hamstring muscles and five associated tendons from T1-weighted images, while the second identifies and quantifies edema using T2-weighted images. The integration of these models allows for a comprehensive anatomical analysis through registered T1 and T2 images. Manual 3D segmentation served as the ground truth for AI validation, with inter- and intra-observer repeatability assessed alongside AI-to-ground truth comparisons. The AI-derived segmentations enabled quantification of edema size, affected structures, and injury location, which were evaluated across BAMIC grades from collegiate athletes with varying hamstring strain injuries to assess whether the AI metrics could distinguish injury severity in line with clinical grading (British Athletics Muscle Injury Classification - BAMIC scores) The primary aim of this study was to evaluate the accuracy of these AI-derived segmentations and assess their potential to quantify injury severity in alignment with clinical BAMIC grading.

Methods

Dataset

This dataset consisted of 361 T1-weighted and 140 T2-weighted bilateral thigh MRI scans acquired as part of an ongoing multicenter (five separate sites) prospective cohort study involving male Division I collegiate American football athletes (ClinicalTrials.gov ID: NCT05343052)¹². If an athlete sustained a hamstring strain injury, they received both a T1 and T2 scan at time-of-injury (TOI) and return-to-sport (RTS). An additional 27 T1 scans and 98 T2 scans were obtained retrospectively from one study site monitoring collegiate athletes and healthy adults from 2017 to 2022 (these were used for AI training and repeatability analysis only, in which only 86 of the 98 T2 scans were used for training due to processing constraints) scans utilized for analysis of hamstring tendons and edema, respectively, totaling 388 T1-weighted and 238 T2-weighted scans utilized for analysis of hamstring tendons and edema, respectively. All locations acquiring MR scans or providing retrospective datasets were approved for data collection by an institutional review board (IRB) and complied with Health Insurance Portability and Accountability Act guidelines. All subjects provided informed consent for data collection and aggregation, consistent with the IRBs. Financial support was provided by a research grant from the NFL, and the National Basketball Association & GE HealthCare Orthopedics and Sports Medicine Collaboration. Springbok Analytics processed and analyzed all data; personnel at University of Wisconsin-Madison were responsible for the determination of scan inclusion/exclusion criteria, and clinical BAMIC scoring administered on the T2 images¹². Inclusion criteria for this study included adult student-athletes actively rostered on the varsity football team of their institution. Exclusion criteria included MRI contraindications and malignant disease history¹².

All T1 and T2-weighted MRIs were acquired in the axial plane with an in-plane isotropic resolution of ~ 1.1 mm and slice thickness of ~ 5 mm capturing the full femur. Proximal hamstring tendon coverage on the T1-weighted images began at the origin of the biceps femoris semitendinosus tendon and semimembranosus proximal tendon at the ischial tuberosity, which was fully captured in the protocol as the full femoral head and hip joint were included in the scanning region. Distal coverage past the femur on the T1-weighted images, through the tibial tuberosity varied by scan and resulted in varying coverage of the semitendinosus distal tendon, which inserts at the pes anserinus on the medial aspect of the tibia. If distal coverage of this tendon was truncated, the segmentation was continued until the most distal image and treated as fully captured for analyses of AI performance and repeatability. T2-weighted images had similar coverage range that focused on capturing the full femur and capturing the regions where edema was localized around the site of injury.

AI framework development

Manual segmentation of hamstring muscles, tendons, and edemas was performed using Slicer3D (v4.11)¹³. Segmentation was completed in two parts: (1) HTEN for hamstring muscles and tendons using T1-weighted images, and (2) EDEMA for edema using T2-weighted images. HTEN segmentation followed defined anatomical boundaries (methodology was approved by clinicians), while EDEMA segmentation identified regions with brighter contrast, specifically, a combination of local Otsu intensity filters and anatomical knowledge regarding typical edema shape and presentation was used to segment the edema. Any bright regions due to image biasing were not included in the edema segmentation. This version of segmenting edema is more conservative. The methodology for edema and fluid segmentation was approved and monitored by a radiologist. Ground truth labels were created via manual segmentation from trained engineers and verification by a senior engineer was conducted if the segmentation was performed by a junior engineer to ensure accurate and consistent quantification, and all EDEMA and HTEN segmentation approaches received expert approval. Any unclear segmentations were escalated from engineers to senior engineers and subject-matter experts for resolution to ensure segmentation consistency. The segmentation process, reviewed by one senior engineer at minimum, produced both images and labels for all HTEN and EDEMA scans, see Fig. 1 for HTEN and EDEMA segmentation atlases and Fig. 2a for detailed descriptions and visualizations of the regions of interest (ROIs) and the AI process. HTEN and EDEMA components acquired in the same session were registered using ANTs registration¹⁴specifically, rigid transforms were used to assure an overlay of the T1 and T2 acquired images and segmentations to ensure all labels were within the same reference orientation and position. Patients did not move between their T1 and T2 scan, therefore, a nearly perfect overlay was expected and was visually confirmed.

Two unique AI models were trained (Fig. 2b). The first was trained to segment HTEN ROIs from T1-weighted images. The second model was trained to segment EDEMA from T2-weighted images. All AI models utilized a modified 3D U-Net structure¹⁵ to segment based on pixelwise prediction maps. Every level in the encoder contains layers of two blocks of a 3 × 3 × 3 convolution layer, a batch normalization (BN) layer, and a rectified linear unit (ReLU) activation layer, followed by a 2 × 2 × 2 maxpooling, excluding the bottom-most level. In the decoder, each level consists of layers with a 2 × 2 × 2 deconvolution layer, followed by two blocks of a 3 × 3 × 3 convolution, a BN, and a ReLU layer. In addition, feature maps from the encoder were concatenated to those of the same resolution in the decoder as the skip connection. The final block of the network contains a 1 × 1 × 1 convolution layer to reduce the dimension of the features to match the number of label maps, followed by a pixelwise softmax classifier. Prior to AI input, all images and labels were resized to have a consistent in-plane resolution/minimum shape. Upon deployment, final label maps were resized and oriented to the original resolution and form of input images.

The algorithms were implemented based on the framework and training of TensorFlow, and testing was performed on two NVidia 1080Ti GPUs with 11 Gb of memory each. During training, weights were initialized randomly from Gaussian distribution and updated, with an initial learning rate of 0.01 and the pixelwise dice loss + cross-entropy as the loss function, using the adaptive moment estimation (Adam) optimizer for gradient descent. The initial learning rate was 0.01 and the loss function was pixelwise dice loss + cross entropy. Due to memory limitations, the batch size was set to 1. Extensive data augmentation, including shearing and rotation, was applied in the training process. The training time for a stage was approximately 48 h with 3000 epochs. Deployment was under 10 min per scan on one GPU. During training, a window was randomly selected to produce a model robust to variation in ROI position, in implementation a sliding window was utilized.

For AI training of the HTEN and EDEMA models, two sites from the multicenter prospective cohort study (ClinicalTrials.gov ID: NCT05343052)¹² were randomly excluded for AI validation (each with T1 and T2 images due to injury), along with three subjects from training sites. These two imaging sites were intentionally excluded to serve as an external validation set - an approach that ensures that evaluation metrics reflected performance on unseen data from independent sources The HTEN AI was trained on 360 datasets, and the EDEMA AI on 198 datasets. Each trained image and label represented a bilateral set of a patient’s legs. Some had bilateral edema (edema present on both limbs) and others had edema only present on a single limb, however edema was always present on at least one side of the patient. This allowed the AI to be trained in both cases with and without edema. Specifically, 55.6% of the trained limb sides had edema present. There was also diversity in the edema present both in edema size and location, see Supplemental Material for a figure displaying the distribution of the edema sizes (volume in ml). Validation was conducted on 18 unique athletes with 22 unique injuries across varying time points, totaling 28 datasets with both HTEN and EDEMA analyses (Table 1). For all 28 validation datasets, a radiologist reviewed their T2 images and assigned a BAMIC grading.

Table 1 Description of validation dataset. BAMIC anatomical site is myofascial (a), myotendinous junction/muscular (b) tendinous (c). Injury status was either time of injury (TOI) or return to sport (RTS).

Full size table

Metric quantification

The following metrics were obtained from ROIs created from the EDEMA AI: volume, maximum cross-sectional area (CSA - in the axial plane), and inferior-superior length of the edema present (Fig. 3A). The following metrics were obtained from ROIs created from the HTEN AI: dice similarity coefficient¹⁶ (DSC, muscle only), volume, length, maximum CSA, and surface area. For tendon ROIs, volume, length, and maximum CSA were reported for the total tendon, free tendon (slices with only tendon pixels), and aponeurosis (slices with tendon and associated muscle pixels). DSC, which is a representative metric of similarity between segmentations by quantifying overlap where 0 indicates no overlap and 1 indicates perfect overlap¹⁶was not calculated for tendon ROIs due to the low volume of the segments, which may result in low DSC values for small differences in overlap. Additional metrics included myotendinous complex volume (muscle and associated tendons) and myotendinous junction surface area (interface between tendon aponeurosis and associated muscle), as shown in Fig. 3B.

Inter/intraobserver repeatability and AI validation

The interobserver and intraobserver repeatability of manually segmenting or quality-assuring HTEN and EDEMA segmentations were assessed. All segmentations were conducted by trained engineers following the segmentation methodologies outlined in Fig. 1. The development of the segmentation atlases that guided segmentation were reviewed and approved by subject matter experts to ensure accurate and consistent segmentation of musculoskeletal and tendinous structures, as well as fluid segmentation. Any abnormal muscle or tendon morphology or segmentations that did not align with the atlas was reviewed by multiple trained engineers and experts to arrive at a consensus on the structure segmentation.

To assess EDEMA interobserver repeatability, two engineers independently segmented 86 scans from the retrospective one-site study. EDEMA intraobserver repeatability was assessed through the random repetition of 12–13 scans from this cohort by each engineer, resulting in a total of 25 comparisons.

HTEN interobserver repeatability, for both tendon and muscle ROIs, was assessed through the segmentation of one scan by two engineers and another scan by three engineers, resulting in a total of eight interobserver comparisons per ROI. Intraobserver repeatability was evaluated by having two engineers segment a scan twice, yielding a total of four comparisons per ROI.

To validate the AI, the EDEMA and HTEN metrics predicted by the AI were compared with ground-truth segmentation for all 28 validation datasets. This resulted in 28 comparisons for the EDEMA ROI and 56 comparisons for the HTEN ROIs, both tendons and muscles, due to bilateral coverage. None of the 28 validation datasets were included in the training of the AI models.

For each comparison, the average difference, average coefficient of variation (calculated across observations per subject and then averaged), correlation, and a Bland-Altman analysis to test for proportional bias was analyzed, see Tables 2 and 3 for comparison counts per ROI and metrics analyzed.

Table 2 The interobserver repeatability, intraobserver repeatability, and AI validation results for the EDEMA scan ROI and for the HTEN scan tendon rois for the examined metrics (volume, length, max CSA, and surface area). The HTEN scan results are for the whole tendon, the results for the free tendon and aponeurosis regions are in Supplemental Material. Average difference, average coefficient of variation (%), correlation coefficient (R), and Bland-Altman assessment for proportional bias are reported. Significance levels of the measures reported are denoted using: *p < 0.05, **p < 0.01, †p < 0.001.

Full size table

Table 3 The AI validation, interobserver, and intraobserver repeatability results for the main metrics derived in the HTEN analysis (volume, length, max cross-sectional area (CSA), and surface area) for muscle rois across the whole muscle structure. Average difference, average coefficient of variation (%), correlation coefficient (R), DSC (reported for the full 3D structure only in the volume row), and Bland-Altman assessment for proportional bias are reported. Significance levels of the measures reported are denoted using: *p < 0.05, **p < 0.01, ^†p < 0.001.

Full size table

AI edema and hamstring segmentation quantification vs. BAMIC grades

The corresponding HTEN and EDEMA scans were overlayed onto one another for all 28 AI validation outputs. The overlap in the edema pixels with the hamstring tendon and muscle pixels were used to determine the percentage of intramuscular (muscle specific), intratendinous, and extramuscular edema as well as the percentage of the muscle impacted by edema (Fig. 3c).

To compare the AI method for quantifying hamstring injury with clinical BAMIC grading, AI metrics and the percentage of muscle most impacted by edema were related to BAMIC grades for the 28 validation scans. For the assessment of AI and ground truth segmentations and the corresponding BAMIC grades at myofascial and myotendinous/muscular sites, grades 0/1 and grades 2/3 were grouped. This grouping was conducted due to the limited range of BAMIC scores reported in this dataset, 0 to 3 only, and the minimal to mild injury presence defined for grades 0/1 as opposed to the moderate and high injury presence via STIR signal as defined for grades 2/3 in the BAMIC grading scale. This grouping allowed for the ranges of injuries observed in these discrete injury classifications to be assessed for statistically significant differences as severity increases from mild to moderate/high. Unpaired t-test comparing grades 0/1 with grades 2/3 were used for myofascial and myotendinous/muscular sites. For tendinous sites, which can receive a BAMIC grade no lower than a 2, an unpaired t-test for BAMIC grades of 2 and 3 was utilized. No BAMIC grades of 4 were present in the validation set. These t-tests were used to investigate if the metrics increase with increasing BAMIC grade. To determine whether discrepancies between AI segmentations and BAMIC scores arose from AI errors or differences in volume-based versus qualitative grading methods, the same analysis was repeated using manually vetted labels.

Statistical analysis

Analyses were performed with Python, version 3.9.6 (Python Software Foundation). The statistical codes are available upon request.

Results

Inter/intraobserver repeatability and AI validation

For EDEMA and HTEN repeatability analyses, interoberserver and intraobserver observation results for edema fluid segmentation (Table 2), tendon structure segmentation (Table 2), and hamstring muscle segmentation (Table 3) exhibited strong correlations and low average differences, with intraobserver measures showing a trend of more consistent results between comparisons.

EDEMA analysis interobserver and intraobserver correlations—R \(\:\ge\:\) 0.92 across all edema metrics—exhibited strong agreement, and no proportional bias in volumetric or area measures (p > 0.05), showing consistency in quantification between observers for both intraobserver and interobserver, (Table 2). Only intraobserver measures of edema length exhibited proportional bias (t(24) = −2.27, p = 0.03), however, this same measure in the interobserver results exhibited no proportional bias (t(98) = 0.21, p = 0.83).

HTEN analysis interobserver and intraobserver volumetric measures at the total tendon level exhibited strong positive correlations, R\(\:\ge\:\) 0.72 for almost all tendinous structures (except the semimembranosus distal tendon) (Table 2). Analysis of the free tendon and aponeurosis regions of the tendon structures exhibited similar trends, with most volumetric measures exhibiting strong correlation strength and no proportional bias between observers (see Supplemental Materials). Interobserver measures exhibited more variability than intraobserver measures, which was expected, however, interobserver results exhibited agreement in tendon observations and low average differences in the main metrics. Similar assessments of the HTEN analysis muscle ROIs exhibited strong agreement in interobserver and intraobserver measures with DSC values \(\:\ge\:\) 0.97 for all structures and significantly strong correlations for nearly all main metrics (Table 3). Interobserver and intraobserver results were also generated for the combined myotendinous structure and the surface area measure of the myotendinous junction interface (see Supplemental Material for results).

The validation of the EDEMA and HTEN AI models demonstrated strong correlations between AI-generated and ground-truth labels across metrics, regions, and ROIs. Correlation coefficients exceeded R = 0.92 for muscle ROIs (Table 3), R = 0.76 for full tendon ROIs (except the biceps femoris-semitendinosus joint tendon: R = 0.42) (Table 2), and R = 0.79 for the edema ROI (with R = 0.97 for edema volume) (Table 2). A significant bias was observed for edema ROIs: the AI model volumes were on average lower than ground truth, indicating that AI tends to conservatively segment edema compared to manual segmentation (Fig. 4). Similarly, tendon ROIs showed a significant bias, with AI segmentations reporting smaller values than the vetted observations (Fig. 4). Additionally, it was observed that in certain tendons, specifically the biceps femoris semitendinosus joint tendon, exhibited a number of datapoints with higher discrepancies in AI and manual segmentation observations (Fig. 4). Of the validation set, this tendon structure was the most commonly injured and was often characterized by morphological changes at the injury site that may be under segmented by the AI and included in the segmentations by trained engineers, accounting for discrepancies in observations and contributing to the significant bias observed. For muscle ROIs, biases were noted in the semitendinosus and biceps femoris short head volumes, likely due to muscle pixels being manually reassigned as tendon pixels in the aponeurosis regions. Full results for repeatability analyses, interobserver and intraobserver, and AI validation of HTEN ROIs are reported in Supplemental Materials.

AI edema and hamstring segmentation quantification vs. BAMIC grades

The AI method for quantifying hamstring injury and edema showed a strong trend with the BAMIC method, exhibited through statistically significant differences in edema volumes observed for low and high BAMIC grades at the different sites (Fig. 5a). Edema volume increased significantly with BAMIC grade between grades 0/1 vs. 2/3 for myotendinous and myofascial sites (p = 0.01). Maximum percentage of muscle impacted by edema showed a non-significant increase between grades 0/1 and 2/3 for myotendinous and myofascial sites (p = 0.07). For tendinous sites, the unpaired t-test revealed non-significant differences in total edema volume (p = 0.12) but significant differences in maximum muscle impact between BAMIC grades 2 and 3 (p = 0.02), with grade 3 showing more edema within muscle borders. Intratendinous edema was quantified in all BAMIC sites. Similar trends were observed using manually labeled methods, aligning closely with AI outputs (Fig. 5b). Length and maximum CSA relationships with BAMIC scores were also analyzed and exhibited similar results to the volumetric analyses, where a significant difference was observed between grades 0/1 vs. 2/3 for myotendinous and myofascial sites (p < 0.001) and no significant difference was observed between the 2 and 3 BAMIC grade groups for tendinous sites (see Supplemental Material).

Visual 3D renderings of the EDEMA overlaid with HTEN structures enabled anatomical site identification consistent with BAMIC site grading (Fig. 5c).

Discussion

AI quantification of muscle, tendon, and edema showed strong interobserver and intraobserver segmentation repeatability for hamstring and edema metrics. In comparison, work has shown interrater correlations of BAMIC score at 0.64¹⁷. AI outputs correlated well with ground truth segmentation, with slightly higher error than intraobserver variability. We note that standard deviations for the edema volume differences were relatively large compared to the mean values (Table 2). This reflects the wide variation in edema size across the validation dataset — ranging from small, localized injuries to large, diffuse edema in more severe cases. Despite this variability, the AI outputs exhibited a strong correlation with ground truth (R = 0.97), supporting the robustness of the model across a spectrum of injury presentations The AI tended to under-segment edema/tendons and over-segment muscles, with reduced accuracy for the biceps femoris-semitendinosus joint tendon due to frequent injuries and morphological abnormalities. Impacted AI performance in variable ROIs (i.e. edema and tendon aponeuroses), particularly at injury sites, demonstrates the importance of quantitative evaluation of both myotendinous structures and fluid following an injury, as edema can impact visualization of surrounding structures. Expanding AI training datasets to include more injury cases with varying morphological presentations and post-output editing can further improve segmentation accuracy.

Previous studies have quantified edema using MRI segmentations¹⁸ and intensity methods¹⁹ to assess the use of quantitative edema measures as an alternative to the visual radiological assessment and have exhibited improved accuracy of fluid assessment using quantitative measures. Our study introduces an automated (< 1 min) approach that integrates quantitative edema analysis within its surrounding anatomy and pairs quantification of muscle and tendon structures. Registered AI outputs for hamstring and edema structures showed significant increases in edema metrics (volume, length, max CSA, impacted structures) with increasing BAMIC scores. Although the relationship was not perfect, AI and manual outputs yielded similar results when compared to BAMIC grades, suggesting differences stemming from methodological variations (quantitative volume vs. qualitative grading). Additionally, the AI provides intuitive visual categorizations of edema locations aligned with BAMIC site gradings. Pixel overlap quantification and 3D visualization may improve upon BAMIC grading and site assignment as intratendinous edema can be more definitely distinguished from intra/extramuscular edema. We hope that this proposed methodology can be used as a tool for clinicians to assess, quantify, and visualize hamstring injuries.

Previous studies utilizing AI and semi-automated approaches for MRI anatomy segmentation have primarily focused on muscle and bone structures^16,20,21. In automated musculoskeletal quantification from MR images, average DSC values for lower extremity muscles in the thigh and calf have been reported to range from 0.75 to 0.93^16,22,23. Our results for hamstring muscle DSC values are on the higher end, with all muscle segmentations achieving on average above 0.96 when compared to ground truth labeling. High DSC values indicate strong agreement, supporting the reliability of our AI model in delineating muscle structures. In this application, where precise anatomical boundaries are critical for volumetric and regional edema quantification, a high DSC provides confidence that downstream metrics reflect true tissue geometry. While these results demonstrate strong segmentation accuracy, differences in datasets, labeling protocols, and model architectures should be considered when making direct comparisons. To our knowledge, no prior study has validated an AI model specifically for automated segmentation of hamstring tendons from MRI. Previous work has demonstrated the ability to manually segment hamstring muscles and tendons from MRI as a gold standard for assessing their volume in comparison to ultrasound²⁴but these studies have only suggested the potential future use of AI to automate the process. In the context of anterior cruciate ligament injury, AI-based classifiers have been used to grade injury severity from MRI without segmenting individual structures²⁵. Additionally, separate studies have employed hamstring muscle segmentation using diffusion tensor imaging to assess architectural characteristics after intervention²⁶. However, these approaches do not integrate comprehensive muscle, tendon, and edema segmentation into a unified framework to assess injury of hamstring anatomical characteristics. For edema segmentation, prior work has investigated fluid and edema segmentation in various anatomical regions, including adipose tissue²⁷thoracic regions²⁸and the brain²⁹. Some studies have also explored T2 mapping overlays in the thigh region for assessing inflammatory myopathies³⁰. However, to our knowledge, no prior study has specifically focused on AI-based volumetric segmentation of muscular edema injuries. Our study is among the first to develop and validate an AI model for fully automated segmentation of the hamstring muscles and tendons from MRI, generating reliable measures of volume, length, and other quantifiable features important for anatomical and clinical assessment. Furthermore, we are one of the first to apply automated edema segmentation specifically to muscle injuries, allowing for an integrated analysis of hamstring muscle and tendon segmentations with edema quantification. This framework enables injury assessment and classification methods similar to BAMIC grading, without requiring an AI classifier, while providing detailed visualization and quantifiable metrics.

Limitations include a small validation dataset (28 scans), mitigated by using diverse injury presentations and validating on scans from MRI/acquisition sites that were not used to train the AI. Another limitation was a somewhat narrow representation of BAMIC grades (i.e. no BAMIC grade 4 injuries), mitigated by analyzing tendinous injury cases separately from the myotendinous cases and myofascial cases. This study consisted of all young male Division 1 collegiate football athletes; therefore, we were not able to assess how the proposed methodology would operate on subjects of differing population pools. Our future work will aim to assess this methodology in more diverse populations. Additionally, several outlying data points observed in Fig. 4 reflect known limitations of AI generalization in cases with morphological variability or extreme presentation. Specifically, the biceps femoris-semitendinosus joint tendon, a commonly injured region, exhibited discrepancies likely due to irregular tendon shapes at injury sites that were captured more precisely in manual segmentations. The single large outlier in edema volume corresponds to a visibly hypertrophied athlete with extensive injury-related fluid, and while not an error, highlights the importance of incorporating diverse training data in future models. Lastly, edema segmentation remains subjective, but standardized training and physician consultation improved consistency. While the AI occasionally under-segmented small edemas, this is preferable to over-segmentation and can be addressed with larger, more diverse training datasets.

In conclusion, two AI models for segmenting hamstring muscles, tendons, and edema from MRI scans were developed. Validation against manual segmentation showed strong agreement, low errors, and robust inter/intraobserver repeatability. Combined analysis of hamstring muscle/tendon and edema segmentations revealed similar findings to BAMIC gradings, in which edema volume and percentage of muscle impacted demonstrated significant increases with increasing BAMIC grades. This novel method enables objective, automated, quantitative studies of hamstring injuries, offering detailed visualization of injury extent to potentially improve athlete diagnosis and treatment. Future work will examine the clinical utility of this metric for more effective assessment of hamstrings injury, enabling more personalized return-to-sport criteria and re-injury mitigation.

Data availability

Data Sharing: The datasets used and analyzed during the current study are available from the corresponding author, Silvia Blemker, on reasonable request.

Abbreviations

AI:: artificial intelligence
3D:: three-dimensional
ROIs:: regions of interest
BAMIC:: British athletics muscle injury classification
TOI:: time of injury
RTS:: return to sport
EDEMA:: T2 scan and associated labeled edema
HTEN:: T1 scan and associated labeled hamstring muscles and tendons
CSA:: cross-sectional area
DSC:: dice similarity coefficient

References

Maniar, N. et al. Incidence and prevalence of hamstring injuries in field-based team sports: A systematic review and meta-analysis of 5952 injuries from over 7 million exposure hours. British Journal of Sports Medicine vol. 57 Preprint at (2023). https://doi.org/10.1136/bjsports-2021-104936
Shankar, P. R., Fields, S. K., Collins, C. L., Dick, R. W. & Comstock, R. D. Epidemiology of high school and collegiate football injuries in the united states, 2005–2006. American J. Sports Medicine 35, 1295–1303 (2007).
Luxenburg, D. et al. Hamstring injuries in the National football league: an epidemiological study. J. Orthop. 35, 111–114 (2023).
Hallén, A. & Ekstrand, J. Return to play following muscle injuries in professional footballers. J Sports Sci 32, 1229–1236 (2014).
Wangensteen, A. et al. Intra- and interrater reliability of three different MRI grading and classification systems after acute hamstring injuries. Eur J. Radiol 89, 182–190 (2017).
Wangensteen, A. et al. New MRI muscle classification systems and associations with return to sport after acute hamstring injuries: a prospective study. Eur Radiol 28, 3532–3541 (2018).
Hamilton, B. et al. Excellent reliability for MRI grading and prognostic parameters in acute hamstring injuries. Br J. Sports Med 48, 1385–1387 (2014).
Chan, O., Del Buono, A., Best, T. M. & Maffulli, N. Acute muscle strain injuries: A proposed new classification system. Knee Surgery, Sports Traumatology, Arthroscopy vol. 20 Preprint at (2012). https://doi.org/10.1007/s00167-012-2118-z
Monte, J. R. et al. Diffusion tensor imaging and quantitative T2 mapping to monitor muscle recovery following hamstring injury. NMR Biomed 36, e4902 (2023).
Wille, C. M. et al. Association of quantitative diffusion tensor imaging measures with time to return to sport and reinjury incidence following acute hamstring strain injury. J. Biomech. 163, 111960 (2024).
Crema, M. D. et al. Hamstring injuries in professional soccer players: extent of MRI-Detected edema and the time to return to play. Sports Health 10, 75–79 (2018).
Heiderscheit, B. C. et al. The development of a hamstring injury (HAMIR) index to mitigate injury risk through innovative imaging, biomechanics, and data analytics: protocol for an observational cohort study. BMC Sports Sci. Med. Rehabil 14, 128 (2022).
Fedorov, A. et al. 3D slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging. 30, 1323–1341 (2012).
Article PubMed PubMed Central Google Scholar
Avants, B. B. et al. A reproducible evaluation of ants similarity metric performance in brain image registration. Neuroimage 54, 2033–2044 (2011).
Article PubMed Google Scholar
Ni, R., Meyer, C. H., Blemker, S. S., Hart, J. M. & Feng, X. Automatic segmentation of all lower limb muscles from high-resolution magnetic resonance imaging using a cascaded three-dimensional deep convolutional neural network. J. Med. Imaging. 6, 1 (2019).
Article Google Scholar
Agosti, A. et al. Deep learning for automatic segmentation of thigh and leg muscles. Magn. Reson. Mater. Phys., Biol. Med. 35, 467–483 (2022).
Article Google Scholar
Day, M. A. et al. Correlation of player and imaging characteristics with severity and missed time in National football league professional athletes with hamstring strain injury: A retrospective review. Am. J. Sports Med. 52, 2709–2717 (2024).
Article PubMed Google Scholar
Biglands, J. D. et al. MRI in acute muscle tears in athletes: can quantitative T2 and DTI predict return to play better than visual assessment? Eur Radiol 30, 6603–6613 (2020).
Dahlqvist, J. R., Salim, R., Thomsen, C. & Vissing J. A quantitative method to assess muscle edema using short TI inversion recovery MRI. Sci Rep 10, 7246 (2020).
Pons, C. et al. Quantifying skeletal muscle volume and shape in humans using MRI: A systematic review of validity and reliability. PLoS One 13, e0207847 (2018).
Hostin, M. A. et al. The impact of fatty infiltration on MRI segmentation of lower limb muscles in neuromuscular diseases: A comparative study of deep learning approaches. J. Magn. Reson. Imaging 58, 1826–1835 (2023).
Lin, Z. et al. Automatic segmentation of skeletal muscles from MR images using modified U-Net and a novel data augmentation approach. Front Bioeng. Biotechnol. 12, 1355735 (2024).
Henson, W. H., Mazzá, C. & DallAra, E. Deformable image registration based on single or multi-atlas methods for automatic muscle segmentation and the generation of augmented imaging datasets. PLoS One 18, e0273446 (2023).
Frouin, A. et al. Validity and reliability of 3-D ultrasound imaging to measure hamstring muscle and tendon volumes. Ultrasound Med. Biol 49, 1457–1464 (2023).
Namiri, N. K. et al. Deep learning for hierarchical severity staging of anterior cruciate ligament injuries from mri. Radiol Artif. Intell 2, e190207 (2020).
Suskens, J. J. M. et al. Effect of two eccentric hamstring exercises on muscle architectural characteristics assessed with diffusion tensor MRI. Scand J. Med. Sci. Sports 33, 393–406 (2023).
Liu, J. et al. Improved subcutaneous edema segmentation on abdominal CT using a generated adipose tissue density prior. Int J. Comput. Assist. Radiol. Surg 19, 443–448 (2024).
Zhang, X., Noga, M. & Punithakumar, K. Fully Automated Deep Learning Based Segmentation of Normal, Infarcted and Edema Regions from Multiple Cardiac MRI Sequences. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) vol. 12554 LNCS (2020).
Xie, K., Yang, J., Zhang, Z. G. & Zhu, Y. M. Semi-automated brain tumor and edema segmentation using MRI. Eur J. Radiol 56, 12–19 (2005).
Wang, F. et al. Assessment of idiopathic inflammatory myopathy using a deep learning method for muscle T2 mapping segmentation. Eur. Radiol. 33, 2350–2357 (2023).

Download references

Acknowledgements

Acknowledgements: Guarantors of integrity of entire study, L.R., O.D., S.S.B.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, L.R., R.H, M.B.; experimental/clinical studies, S.S.B, B.H, D.O.; statistical analysis, L.R., O.D. and manuscript editing, all authors. HAMIR Consortium Authors: Claire Tanaka, Naoaki Ito, Mikel Joachim, Jack Martin, Christa Wille, Malorie Wilwand, Emma Remington, Brett Mortensen, Casey Metoyer, Jonathan Hauenstein, Josh Huff, Jordan Reyes, Nicholas Port, Joseph Hart, Stephanie Kliethermes, Sarah Sund, Dave Berkoff, Nirav Maniar, Yi-Chung Lin, Ryan Timmins, Jack Hickey, Wayne Johnson, John Wagle.

Funding

This project was made possible through the efforts of individuals affiliated with the HAMIR study with funding from the National Football League, with additional funding from the National Basketball Association & GE HealthCare Orthopedics and Sports Medicine Collaboration. The funding agencies had no role in the study design, collection, analysis, nor interpretation of the data presented, and were not involved in the writing of the manuscript, nor the decision to submit the manuscript for publication.

Author information

Lara Riem PhD and Olivia DuCharme ME Co-first authors.

Authors and Affiliations

Springbok Analytics, 100 West South Street, Suite 1E, Charlottesville, VA, 22902, USA
Lara Riem, Olivia DuCharme, Ashley Coggins, Allison Kenney, Matthew Cousins, Xue Feng, Raina Hein, Max Buford & Silvia S. Blemker
Australian Catholic University, Melbourne, Australia
David Opar
University of Wisconsin-Madison, Madison, WI, USA
Kenneth Lee & Bryan Heiderscheit
University of Virginia, Charlottesville, VA, USA
Silvia S. Blemker

Authors

Lara Riem
View author publications
Search author on:PubMed Google Scholar
Olivia DuCharme
View author publications
Search author on:PubMed Google Scholar
Ashley Coggins
View author publications
Search author on:PubMed Google Scholar
Allison Kenney
View author publications
Search author on:PubMed Google Scholar
Matthew Cousins
View author publications
Search author on:PubMed Google Scholar
Xue Feng
View author publications
Search author on:PubMed Google Scholar
Raina Hein
View author publications
Search author on:PubMed Google Scholar
Max Buford
View author publications
Search author on:PubMed Google Scholar
Kenneth Lee
View author publications
Search author on:PubMed Google Scholar
David Opar
View author publications
Search author on:PubMed Google Scholar
Bryan Heiderscheit
View author publications
Search author on:PubMed Google Scholar
Silvia S. Blemker
View author publications
Search author on:PubMed Google Scholar

Contributions

Guarantors of integrity of entire study, L.R., O.D., S.S.B.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, L.R., R.H, M.B.; experimental/clinical studies, S.S.B, B.H, D.O.; statistical analysis, L.R., O.D. and manuscript editing, all authors.

Corresponding author

Correspondence to Silvia S. Blemker.

Ethics declarations

Competing interests

Lara Riem (Director of AI and Data Science), Olivia DuCharme (Data Scientist), Ashley Coggins (Senior Product Analyst), Allison Kenney (Student Intern, Segmentation Engineer), Matthew Cousins (Head of Product Delivery), Xue Feng (Chief Technical Officer), Raina Hein (Student Intern, Segmentation Engineer), Max Buford Hein (Student Intern, Segmentation Engineer), Bryan Heiderscheit (Advisor of Performance and Sport Science), and Silvia Blemker (Chief Scientific Officer) are full time/contractor employees of Springbok. Lara Riem, Olivia DuCharme, Matthew Cousins, Ashley Coggins, Bryan Heiderscheit, Xue Feng, and Silvia Blemker have stock options or own stock in the company. Kenneth Lee, David Opar, and the HAMIR Consortium have no competing interests to disclose.

Data sharing

The datasets used and analyzed during the current study are available from the corresponding author, Silvia Blemker, on reasonable request.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Riem, L., DuCharme, O., Coggins, A. et al. A deep learning algorithm for automatic 3D segmentation and quantification of hamstrings musculotendon injury from MRI. Sci Rep 15, 33497 (2025). https://doi.org/10.1038/s41598-025-16926-1

Download citation

Received: 17 March 2025
Accepted: 20 August 2025
Published: 29 September 2025
Version of record: 29 September 2025
DOI: https://doi.org/10.1038/s41598-025-16926-1

Subjects

Abstract

Similar content being viewed by others

Specificity of eccentric hamstring training and the lack of consistency between strength assessments using conventional test devices

AI-driven medical image analysis for sports injury diagnosis and prevention

Evaluation of various traditional machine learning techniques for predicting the acute effect of different hamstring muscle stretching methods among male soccer players

Introduction

Methods

Dataset

AI framework development

Metric quantification

Inter/intraobserver repeatability and AI validation

AI edema and hamstring segmentation quantification vs. BAMIC grades

Statistical analysis

Results

Inter/intraobserver repeatability and AI validation

AI edema and hamstring segmentation quantification vs. BAMIC grades

Discussion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Data sharing

Additional information

Publisher’s note

Supplementary Information

Supplementary Material 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links