Introduction

Radiology is widely utilized in forensic anthropology as a non-invasive tool for various applications. Beyond its critical role in analyzing perimortem injuries—particularly through post-mortem computed tomography (PMCT)—radiological imaging also contributes to biological profiling (e.g., sex and age estimation), identification of pathological conditions, documentation of skeletal trauma, and assessment of post-mortem changes1,2. Computed tomography (CT) imaging offers significant advantages, including the ability to collect analytical data without damaging tissues, facilitate indirect transformation and reconstruction of targets, and enable repeated examinations and analyses3. Consequently, CT imaging has been increasingly used in archeology and forensic anthropology research, with numerous studies being published in recent years. Today, it is a routine tool in the forensic sciences4,5,6.

Sex estimation is a fundamental step in individual identification, and its accuracy depends on the integrity and preservation of skeletal remains. Both the skull and the postcranial skeleton can be used for this purpose, with the skull and pelvic bones being particularly useful due to their pronounced sexual dimorphism7,8. However, In actual forensic contexts, human remains are most often recovered in a damaged or fragmentary state due to various environmental and situational factors(such as falls in mountainous terrain, post-traffic accident body disposal, and scavenging by animals) while the recovery of intact long bones is relatively rare9,10. Furthermore, the skull and pelvic bones are often found in partial or fragmented states, reducing the accuracy of sex estimation7,11. This necessitates the development of methods that use various single or multiple bones to estimate sex accurately, particularly when these primary bones are unavailable12.

Long bones are commonly used in metric methods for sex estimation due to their ease of measurement and better preservation. Combining single or multiple long bones can improve the accuracy of sex estimation13. Among these, the humerus is frequently studied in forensic anthropology because it is often discovered in an intact state and has characteristics suitable for metric-based sex estimation14,15. However, due to the challenges osteologists face in devising methods that adequately capture subtle, visually apparent sexually dimorphic shape variations16, developing precise methods for measuring sexually dimorphic traits remains a significant challenge.

Volumetric studies for sex estimation using the humerus are limited, although several three-dimensional CT-based analyses have been reported for other skeletal elements such as the skull, pelvis, femur, sternum, and scapula17,18,19,20. Al so, Shim et al.21 emphasized the importance of three-dimensional measurements using volume, rather than conventional two-dimensional methods such as length, width, depth, and circumference, in studies of the humerus. This is because the humerus has a complex three-dimensional structure composed of irregular curves and surfaces. However, Shim’s study relied on intact bones, where the maximum length could be directly measured, and thus its sex estimation method cannot be readily applied to fragmentary bones. Currently, sex estimation methods applicable to fragmentary humeral remains are extremely limited.

Therefore, the aim of this study was to reconstruct three-dimensional images of the humerus from Korean CT scans, to develop sex estimation functions based on volume—a three-dimensional metric applicable even to fragmentary bones—and to validate the effectiveness of this approach in sex estimation.

Methods

This study analyzed 600 CT images of adult Korean humeri without fractures or deformities, obtained from individuals aged ≥ 20 years who underwent forensic autopsy at the Seoul Institute of Forensic Science, National Forensic Service (NFS) and personal information and sex ratios of the individuals were anonymized to investigators. The dataset included 360 male and 240 female individuals, with an average age of 57.08 years for male and 48.18 years for female individuals(Table 1).

Table 1 Descriptive statistics of the individual age.

The PMCT scans were performed using a SOMATOM AS+ (Siemens Healthcare, Erlangen, Germany) device, used for forensic examinations. The scanning protocol followed the whole-body scanning method described by Jung et al.6 DICOM files generated from the PMCT scans were processed using MIMICS 25.0 (Materialise NV, Leuven, Belgium) software. Humerus images were segmented using a threshold range of 226–3071 HU (Hounsfield Units), and internal voids within the humeri were filled before converting the data into computer-aided design models (Fig. 1A). The reconstructed humeri were further processed using 3-matic 17.0 (Materialise NV, Leuven, Belgium). A standardized axis was established based on the maximum length of the humerus, enabling the measurement of the maximum length and volumetric parameters of each region.

Fig. 1
figure 1

The right humerus volume measurement range of the intact bone(A) and segmented images used for the 20% volume measurement(B); the right humerus volume measurement range of the fragmentary bone(C) and segmented images used for the 5 cm volume measurement(D).

A total of 500 CT images were randomly selected to develop regression equations for sex estimation, and the remaining 100 images were used to validate the performance of the equations.

Measurements and analysis of the humerus

The measurements and analysis of the humerus were conducted using three methods:

  1. (1)

    2D sex estimation based on the maximum length of the humerus (Humax).

  2. (2)

    Volumetric sex estimation by region, based on the maximum length of the intact humerus (20% vol/maximum length of the intact humerus).

  3. (3)

    Sex estimation for fragmentary humeri, when one end of the humerus was fractured, or its maximum length could not be measured (5 cm vol/fragmentary humerus).

For the 20% vol/maximum length of intact humerus method, the maximum length of the humerus was first measured, followed by the volumetric assessment of three regions: head, mid-shaft, and distal regions. The head region was defined as the volume from the apex of the humeral head to 20% of the maximum length toward the distal end. The distal region was defined as the volume from the distal end of the humerus to 20% of the maximum length toward the head. The mid-shaft region was defined as the volume centered at 50% of the maximum length, including 10% of the maximum length on either side, for a total of 20% of the maximum length (Fig. 1B).

For the 5 cm vol/fragmentary humerus method, the head region was defined as the volume from the apex of the humeral head extending 5 cm distally, and the distal region was defined as the volume from the distal end extending 5 cm proximally. Due to the complexity of defining a consistent measurement range, the mid-shaft region was not measured for this method (Fig. 1C, D).

All 600 CT images were analyzed using 3-matic 17.0 (Materialise NV, Leuven, Belgium). The humeri were categorized into right and left sides, and their maximum lengths and regional volumes were measured. For the 100 images used for sex estimation, two independent researchers performed the measurements, and inter-rater reliability was assessed by calculating the Technical Error of Measurement (TEM) (Table 2). Cross-validation was not conducted. Statistical analyses were conducted using SPSS software (version 27; IBM Corp., Armonk, NY, USA). Independent t-tests were used to assess sex differences, and paired t-tests were applied to evaluate differences between right and left humeri (Table 3). Logistic regression analysis was performed using data from 500 CT images to develop sex estimation models (Table 4).

Table 2 Descriptive statistics of the right and left humerus and subregions.
Table 3 Intra-observer reliability statistics by each region.
Table 4 Results of logistic regression analysis based on the volume of each region of the humerus.

Ethics declarations

All experimental protocols performed in this study were approved by the Institutional Review Board (IRB) of the National Forensic Service (IRB approval number: 906-230131-HR-010-01). All procedures and methodology performed for this study involving human subjects were in accordance with the guidelines and regulations of the institutional research committee and with the 1964 Helsinki Declaration and its later amendments. The need for informed consent from the next of kin was waived because all autopsy procedures at the NFS Seoul Institute are performed under a court-approved warrant. The IRB of NFS approval for a waiver of written informed consent was also obtained.

Results

This study utilized a total of 600 three-dimensionally reconstructed humeri from Korean individuals to measure the maximum length, the 20% volume relative to the maximum length of the intact humerus, and the 5 cm volume segments of fragmentary humeri. The results showed statistically significant differences between males and females across all measured regions. In the comparison between the left and right humeri, statistically significant differences were observed in all regions except for the distal region and maximum length of the left humerus (Table 3).

Logistic regression analyses were conducted using 500 CT images to develop sex estimation models based on the humeral measurements (Table 4), from which regression equations for sex estimation were derived (Table 5). The validity of all derived equations was subsequently evaluated using the remaining 100 samples (Table 6).

Table 5 Regression equations for the volume of each region of the humerus.
Table 6 Sex Estimation accuracy of each region in the right and left humerus.

Maximum length

The maximum length of the humerus showed a statistically significant difference between sexes; however, no significant difference was observed between the right and left sides. Sex estimation based on maximum length demonstrated an accuracy of 76% for the right humerus (85.45% for males, 64.44% for females) and 75% for the left humerus (83.64% for males, 64.44% for females).

20% volume / maximum length of intact humerus

In the three-dimensional volumetric analysis based on 20% segment volumes relative to the maximum humeral length, statistically significant differences between sexes were observed in all regions. When comparing the right and left sides, significant differences were also found across all regions, except for the distal region of the humerus, where no statistically significant difference was observed.

In terms of sex estimation accuracy, the head region showed 93% accuracy on the right side (98.18% for males, 86.67% for females) and 91% on the left side (94.55% for males, 86.67% for females). For the mid-shaft region, the accuracy was 93% on the right (92.73% for males, 93.33% for females) and 92% on the left (96.36% for males, 86.67% for females). In the distal region, the accuracy was 93% on the right (94.55% for males, 88.89% for females) and 92% on the left (94.55% for males, 88.89% for females).

5 cm volume / fragmentary humerus

In cases where one end of the humerus was fragmentary and only a 5 cm segment was available for volumetric analysis, statistically significant differences were observed in all regions, both between sexes and between the right and left sides.

In terms of sex estimation accuracy, the head region showed an accuracy of 91% on the right side (98.14% for males, 82.22% for females) and 93% on the left side (98.18% for males, 86.67% for females). For the distal region, the accuracy was 92% on the right (96.36% for males, 86.67% for females) and 89% on the left (98.18% for males, 80.00% for females).

Both volumetric methods;20% volume of the intact humerus and 5 cm volume of the fragmentary humerus, demonstrated high sex estimation accuracy across all measured regions (ranging from 89 to 93%). However, with the exception of the right mid-shaft region, a consistent trend was observed in which females showed lower classification accuracy than males, with a difference of approximately 6–10%. In contrast, the two-dimensional method using maximum length yielded the lowest accuracy and exhibited a marked disparity of about 20% in classification performance between males and females.

Inter-observer reliability

In this study, To ensure measurement consistency and reliability, a quantitative assessment of inter-observer agreement was conducted. Two independent observers (Observer 1 and Observer 2) performed repeated measurements across all defined anatomical regions, and the results were compared. Inter-observer reliability was evaluated based on the absolute mean difference, Technical Error of Measurement (TEM), relative TEM (%TEM), and reliability coefficient (R).

The analysis revealed minimal differences between observers across all anatomical regions. Absolute mean differences ranged from 0.00 to 5.40 units (measurement units to be specified in the Methods section), and TEM values ranged from 0.000 to 9.695 units, indicating low levels of measurement error associated with the defined anatomical landmarks. Relative TEM values ranged from 0.000 to 0.029%, all well below the stringent threshold of 0.03%. In several regions, the TEM was recorded as zero, indicating perfect agreement between observers for those specific measurements. The reliability coefficient (R) exceeded 0.999 across all anatomical regions, confirming an exceptionally high level of inter-observer agreement and the overall consistency of the obtained measurements.

A similar reliability analysis was conducted for the maximum length measurements of the humerus (right and left sides). The absolute mean differences between the two observers were 6.09 units for the right side and 6.23 units for the left side, with corresponding TEM values of 4.31 and 4.41 units, respectively. The relative TEM values were 1.43% for the right side and 1.47% for the left side. The reliability coefficient (R) for both sides was 0.947. Although the relative TEM values for maximum length were slightly higher than those for regional measurements, they remained below 2%, indicating acceptable reliability and consistent agreement between observers for these measurements as well.

Discussion

This study aimed to develop new sex estimation equations based on volumetric measurements of various regions of the humerus—both intact and fragmentary—using three-dimensional (3D) reconstructions derived from CT images.

In this study, The inter-observer reliability analysis demonstrated high precision and reproducibility of the measurement procedures used in this study. Minimal absolute differences and %TEM values below 0.03% indicate consistent anatomical landmark identification. Although maximum length measurements showed slightly higher variability, their relative TEM values remained below 2%, indicating acceptable reliability. The lack of significant measurement error may be due to methodological differences: while region extraction required manual input, percentage-based and interval-based measurements were automated using 3-matic software, reducing observer-related variation. Overall, the results support the reliability of the measurement protocol and the validity of the study’s findings.

The statistical significance of sex differences in the measurements of each region of the Korean humerus, as well as differences between right and left humeri, is presented in Table 3. The significant sex differences observed in this study are believed to be due to differences in the secretion of sex hormones, particularly testosterone, during puberty, which promotes muscle development in males. This results in increased mechanical loading on bones, leading to more robust and thicker skeletal structures22. The observed differences between right and left humeri are likely influenced by the sociocultural tendency toward right-handedness in Korean society, where individuals predominantly use their right hand from early childhood23,24. However, this cultural handedness bias appears to have had no significant effect on the maximum length of the humerus. Additionally, differences in left–right humeral measurements between the 100 samples used for sex estimation and the 500 used for regression modeling may be attributed to random sampling variability, such as differences in sex ratio or age distribution between the two datasets. The humerus is a valuable skeletal element for both metric and non-metric sex estimation methods, and population-specific differences highlight the need to develop sex estimation techniques tailored to each group. Moreover, since in many forensic cases only one side of the humerus—either the right or the left—is often recovered from skeletal remains, it is essential to establish separate sex estimation methods for each side. Previous researchers25,26,27,28,29 have conducted sex estimation studies using humeri from diverse ethnic groups, focusing on two-dimensional (2D) measurements of the head, shaft, and distal regions. These studies reported variations in classification accuracy across populations. In Korea, Lee et al.30 examined dry humeri and found that among several 2D head measurements—including the vertical diameter, maximum diameter, and transverse diameter of the humeral head—the vertical diameter was the most reliable single indicator, yielding an overall accuracy of 87% (89.8% for males and 81.3% for females). In the distal region, the condylar breadth and epicondylar breadth achieved accuracies of 82.9% (83.7% for males and 81.3% for females) and 74.7% (77.6% for males and 68.8% for females), respectively. When comparing the sex classification accuracy of maximum humeral length between the present study and that of Lee et al., the accuracy reported by Lee for the right humerus was 80.8% (85.7% for males and 70.8% for females), whereas in the present study, it was 76% (85.45% for males and 64.44% for females). While the classification accuracy for males was similar between the two studies, the accuracy for females was lower in the present study. The mean values of maximum humeral length for both males and females were higher in the present study than in Lee’s, which is presumed to be due to differences in sample size and age distribution between the two datasets.

Human bones are three-dimensional (3D) structures characterized by irregular curvatures and complex surfaces, making it difficult to accurately capture spatial information using only two-dimensional (2D) measurements. Kranioti et al.31 examined sex estimation based on the morphology of the humeral head and distal region in a Cretan population. When relying solely on shape, they reported classification accuracies of 71% for the head and 73% for the distal region. These accuracies increased to 86.5% and 85.6%, respectively, when size was incorporated. When both shape and size were combined, the accuracies further improved to 89.6% for the head and 89.7% for the distal region. Similarly, López-Lázaro et al.32 studied the humeri of a Spanish population and found that sex estimation based on shape alone achieved accuracies ranging from 54.95 to 77.92% for males and from 56.87 to 71.78% for females. With the addition of size, the accuracy increased to 81.86–94.92% for males and 84.08–94.88% for females. These findings highlight the limitations of conventional 2D measurement techniques—which are confined to linear dimensions such as length, width, depth, and circumference—and suggest that reconstructing CT images into 3D and combining shape and size metrics can significantly improve the reliability of sex estimation. Furthermore, Liu et al.33 emphasized that 3D imaging not only facilitates repeated measurements with higher precision but also that the sex estimation equations derived from such methods are best suited to the specific population under study, as they most accurately reflect that population’s unique characteristics. In this context, the present study captures the 3D nature of the human skeleton by analyzing 3D images of Korean humeri. The analysis yielded high sex estimation accuracies, ranging from 89 to 93% across all regions of both left and right humeri, underscoring the effectiveness of population-specific, 3D-based approaches in forensic identification.

The vertical diameter, maximum diameter, and transverse diameter measured by Lee et al.30 correspond to the 20% volumetric segment of the intact right humeral head in the present study, which achieved a sex estimation accuracy of 93%. The distal measurements reported by Lee et al.—condylar breadth and epicondylar breadth—align with the distal 20% volume in this study, which yielded an accuracy of 92%. Although statistically significant measurement differences were observed between the right and left sides, no substantial differences were found in sex estimation accuracy. Compared to the results of Lee et al., the present study achieved approximately 5% higher accuracy in the head region and 10% higher in the distal region. Although the present study analyzed only the maximum length of the 3D-reconstructed humerus and did not include a direct comparison with conventional two-dimensional (2D) measurements such as humeral head diameter or distal breadth, these traditional 2D indicators, being single linear measurements, do not fully capture the morphological complexity of the humerus. Therefore, if comparisons had been made using a wider range of 2D measurements, the relative superiority of the volumetric analysis method would likely have been demonstrated more clearly.

These findings are consistent with those of Kranioti et al.31 and López-Lázaro et al.32, who emphasized that incorporating more comprehensive information rather than relying on fragmentary portions alone improves the accuracy of sex determination of the humerus. Furthermore, the results of Lee et al.30 demonstrated higher classification accuracy for males than for females in various humeral regions, a trend that was similarly observed in the present study. Kim et al.34 reported that Korean male crania were smaller than those of European males, while Korean female crania were larger than those of Japanese and Indian females. Likewise, Lee et al.30 found that the metric values of Korean male humeri tended to be smaller than those of other population groups. In the present study as well, the number of females whose measured values exceeded the sex classification threshold derived from the regression equation was greater than the number of males whose values fell below the threshold. Taken together, these findings suggest that the lower sex estimation accuracy for Korean females compared to males may be attributable to classification errors arising from population-specific morphological characteristics inherent to the Korean population.

In addition, the novel 5 cm volumetric analysis for fragmentary humeri introduced in this study also demonstrated high classification accuracy. The head region achieved an accuracy of 91% on the right side and 93% on the left, while the distal region showed an accuracy of 92% on the right and 89% on the left. The difference in sex estimation accuracy between males and females was also comparable to that observed in intact humeri. This indicates that even when the humerus is recovered in a fragmentary state—most commonly due to various environmental and situational factors—the measurement method proposed in this study allows for high-accuracy sex estimation.

In conclusion, This study demonstrated that three-dimensional (3D) analysis of the humerus provides more comprehensive information than traditional two-dimensional (2D) methods, resulting in improved accuracy and reliability for sex estimation. However, one limitation of this study is the exclusion of the midshaft region of the humerus from volumetric analysis in fragmentary cases. When both the head and distal regions are damaged, it is difficult to define consistent anatomical landmarks in the midshaft, making it impossible to establish standardized measurement boundaries. As a result, the sex estimation models for fragmentary humeri in this study were limited to the head and distal regions.

Nevertheless, the measurement methodology proposed herein offers a novel approach for sex estimation in skeletal remains with complex morphology, and it may yield even higher accuracy when applied to populations exhibiting greater sexual dimorphism than that observed in Koreans.